deen (@sir-deenicus) Bsky

Hmm wdym? Predicting the training data generating process is basically all they're doing when actually learning. (not the same things as regurgitating the training data).

1 day ago 0 0 0 0

one trained or specialized for one. Especially when there is such data disparity. Then key is to reduce hallucinations and ensure it can say I don't know about cultures that are mostly oral and also largely unrecorded so you can use it to safely zoom into subcultures of interest.

3 days ago 0 0 0 0

It might seem that way but LLMs do better and represent better with more data. There are also shared properties across all human languages and cultures that we don't so readily observe but are captured in those abstract relational spaces.

So an LLM trained on many languages will be better than

3 days ago 0 0 1 0

But note, this isn't something introduced by LLMs. It is downstream of wealth disparities and what the more privileged literati like to study and talk about. But also simple things like population size. So LLMs know a lot more about west african mythology than Central Asian.

3 days ago 0 0 0 0

styles creatively. In subsets of youtube you can find movies and tvshows from all over africa and southeast asia. In every LLM you can condition into a subspace with more concentrated knowledge of local languages, mythologies, customs and traditions than anything that came before by far!)

3 days ago 2 0 1 0

It's not as bad as you'd think. (The internet, youtube in particular and really anything that reduces communication latency and enables the spread of ideas has strong homogenization properties. But cultures are also robust. So hip-hop dances are globally dominant but artists mash with traditional

3 days ago 0 0 1 0

No. Just split the task. Given enough data it's a simple thing to bruteforce memorize a simple map from a token to its contents. There is: typos, joinedwords, educational spelling materials, translation to and from phonetic codes, acronyms, camelCase and more. Tons of signal to learn from over time.

3 days ago 0 0 0 0

Gain of function for AI is stuff like continual learning, recurrence, unobservable iterative refinement. Stuff that makes them more expressive and opaque. Luckily the first is hard, second faces major stability issues and the last is hard to do flexibly--but folks keep chasing after them
¯\_(ツ)_/¯

6 days ago 2 0 0 0

When Chain of Thought is Necessary, Language Models Struggle to Evade Monitors While chain-of-thought (CoT) monitoring is an appealing AI safety defense, recent work on "unfaithfulness" has cast doubt on its reliability. These findings highlight an important failure mode, partic...

arxiv.org/abs/2507.05246

6 days ago 2 0 0 0

There's also (in the other direction), the fact that transformer forward pass cannot compute P-complete functions (fundamentally sequential problems), and no amount of scaling mattering for deep sequential dependencies. CoT extends LLMs to P and so is most faithful for harder problems/computations.

6 days ago 3 0 3 0

What of just joules?

6 days ago 1 0 1 0

Possibly. I don't know how to think about this properly yet. But what I'm vaguely pointing at is that I'm not as clear about what it means for a thing to exist as I used to be. Which also unclarifies what non-existence means too.

6 days ago 0 0 0 0

I disagree, I'm fairly certain that below the surface is an argument that's interesting to analyze. I'll revisit it again soon to be sure. I have a faint memory of there being a subtle aspect that most skipped over because the claim feels so obviously wrong.

6 days ago 0 0 0 1

To have sufficient static capacity to match this, we are in fact still several [(5+)?] orders of magnitude off. I'll revisit this again with carefully done calculations next time.

6 days ago 0 0 0 0

The brain not only changes its synaptic weights but also rewires its physical routing structure. This means the idea that we are close is only deceptively true. The brain, navigating a ~2^(10^15) large state space, rewires more terabytes worth of data in a month than the LLM's total specification.

6 days ago 0 0 1 0

That's about 5 terabytes, which is only [small world network based estimation elided] about a factor of 50-60 off the human brain. A static snapshot in time of the human brain.

The thing that makes this complicated is that while the LLM is fixed in its addressable space, the brain is plastic.

6 days ago 1 0 1 0

Something curious is that diffusion based pure image models can't count past 4-8, which is also about the subitizing limit in animals.

Synapses hold about 4-6 bits each and each parameter in LLMs captures about 4 bits of information.

If it's true that a ~10^13 total param model exists, then

6 days ago 0 0 1 0

Predictively relevant representations learned from text would likely already carry the most important bits.

6 days ago 0 0 0 0

I agree the world would be sampled more richly given more senses, but I'd be surprised if much more than already captured in text is required by modeling. Like how consciousness only processes some tens of bits at most, or how bad most of us are at drawing what we see/saw.

6 days ago 0 0 1 0

I do think that whole branch of philosophy/religion is surprisingly very relevant yes!

6 days ago 1 0 0 0

Tbh, I'm also struggling with this too.

6 days ago 0 0 0 0

I suspect only a modest lift, in terms of being slightly better at process explanations and a modest boost to spatial reasoning, if at all (proprioceptive data is most important for that).

1 week ago 1 0 1 0

Multimodality is indeed a weak aspect of models. But it's not an afterthought--it's (oh dear this pattern is low class now isn't it?)--it's just difficult.

However, what independent/novel bits do you suppose exist in image and video data that is not already present in sufficient form in text?

1 week ago 2 0 1 0

Alas, only for easy/well-structured spaces. Which this problem was explicitly constructed to have.

1 week ago 2 0 0 0

(We see buried evidence as lack of well-definedness of progress as achilles heel. In essense the LLMs have weaker priors but highly structured relational model of what makes up the problem space, which is aligned enough with what they know. Then novel combinations of those can make progress)

1 week ago 0 0 0 0

Hmm, depends on the structure of the solution space. For this problem, it seems there were lots of shallow solutions readily reachable (ie thinking of this as searching a decision tree). For problems with deeply buried evidence and or sparse leaf nodes, volume strat won't work anywhere near as well.

1 week ago 1 0 1 0

Can you explain?

As language models the continuous ones do not work well and the discrete ones don't change things (they work better though still not as well as AR-LLMs--but they do make things wonderfully convoluted for the maybe conscious LLM insister).

1 week ago 1 0 0 0

There's also the side-possibility that knowing too much sometimes flips and makes some inference problem harder, and a solver (person) worked it out not due to cleverness particularly, but cause they just happened to know the right things (and just that) by chance that made the problem legible.

1 week ago 0 0 0 0

Where being super-intelligent and capable does not mean being able to select well across combinatorial decision spaces. In a sense, there's a kind of arbitrariness to it and maybe linked to our roles as sophisticated observers embedded in this universe. Which is a deeper thing than we suspect?

1 week ago 0 0 1 0

I think it's more that good ideas are hard to find; then the ability to see it through is a further non-trivial challenge. In theory anyone can write a story but a good story idea and then polishing it towards a novella is much, much harder.

I also think moravec's paradox will hit AIs hard here too

1 week ago 0 0 1 0

Posts by deen