Advertisement ยท 728 ร— 90

Posts by KS

Preview
Shared functional specialization in transformer-based language models and the human brain - Nature Communications The extent to which transformer-based language models provide a good model of human brain activity during natural language comprehension is unclear. Here, the authors show that the internal transforma...

True, but there is some cool research to this end.
www.nature.com/articles/s41...

4 days ago 1 0 0 0

Not the modality. The fundamental difference is embodiment in the real world and continual learning. Also smell and taste, though these are likely minor.
bsky.app/profile/kazz...

4 days ago 1 0 0 0

Frontier LLMs mix language, code, math, images, audio. They are also policies (trained using reinforcement learning to solve classes of tasks rather than just the best next word). "predictive text generation based on language" is incorrect. code & math are different beasts than language.

4 days ago 0 0 0 1

The first part is absolutely correct, the second part is also correct but not because AI models are " mostly incorrect"; au contraire. I think not recognizing how powerful these models are is also a problem because you don't realize the problems they pose in preventing learning.

4 days ago 0 0 0 0

The inaccuracy part is not true for frontier AI models. It was absolutely true till last year but the current crop of AI models have tool use ability so they simply web search & collate as needed.

That said. frontier AI models can make you a lot dumber if you outsource your thinking.

4 days ago 0 0 0 0
Post image

1/n New paper - V-GIFT ๐ŸŽ

Self-supervised tasks like rotation prediction or colorization were big in 2018.
Do they still matter?

Yes.
We turn them into visual instruction tuning data for MLLMs.

Result: models rely more on the image and perform better on vision tasks ๐Ÿ‘€

4 days ago 22 7 1 1

Tucker is a Christian Nationalist but evidently committed to the Constitutional separation of Church and State - to prevent corruption of the Church; by no means an ally, actually a different kind of adversary, but calling him a "Nazi" is too flippant.

4 days ago 0 0 0 0

Exactly the right question. AI tools will make you dumber unless you check and direct every single thing - which does make technical a lot faster - but you also learn in the process of trying things out. Basically, expertise is likely to increase if you have it.

5 days ago 0 0 0 0

This is not accurate, however. Interpolating through data/reasoning space is not the same as "it outputs what we feed them". Easily checked with 20 bucks/ month subscription for one month with either Codex or Claude Code. Yes you are paying thieves, but its only for a month.

5 days ago 1 0 0 0

75% of the software industry and most of STEM research. So yes "they are out there" is accurate. And no they almost certainly cannot be reasoned with as they are all full-time active users. Even the world's most famous mathematician, Terrance Tao, is all in.

5 days ago 6 0 0 0
Advertisement

They are *corporate* owned primarily - i.e. VCs not charities except one. They expect a return.

5 days ago 0 0 0 0

The war has 40%+ support in the US and 90%+ support among Jewish Israelis yet both are symmetrically fighting this war. It's not a completely insane narrative to emerge.

1 month ago 0 0 0 0

the Rs problem was actually solved at Sonnet 4.5.

1 month ago 1 0 0 0

Wow alright, would definitely want to give this a try.

1 month ago 1 0 0 0

Cool. So are there any speed gains? If so, I am curious where they are coming from.

1 month ago 1 0 1 0

I was stupid in my choice of phrase here. I meant math/chemistry symbols, equations, genetics (nucleotide or amino acid sequences), chess, code etc. VLMs (which all frontier models are) tokenize images and treat them like language. The math specific comment is here bsky.app/profile/kazz...

1 month ago 0 0 0 0

Also I can continue these arguments ad infinitum. I have an obsessive need to continue every thread of discussion forever. I am not badgering or something like that if I give that impression.
We can stop this anytime if you say so and I will stop replying.

1 month ago 0 0 0 0
A Claude Opus 4.6 prompt which only gives the integral to be evaluated in latex and no natural language at all. The agent accurately identifies the task and performs the integral.

A Claude Opus 4.6 prompt which only gives the integral to be evaluated in latex and no natural language at all. The agent accurately identifies the task and performs the integral.

Actually a math prompt without a single word in natural language is correctly interpreted and solved by Opus 4.6 (it's not a difficult integral; I'm just making a point). I think the Kean et al study makes the argument that language and language of thought are different things.

1 month ago 0 0 0 0

As I said calling math, Natural language is i think too broad and not how the brain processes it (Kean et al ref). E.g. Emily is a computational linguistics and structure expert but unless I am mistaken, not a mathematician. I can feed a math prompt with the only Natural Language being "Solve".

1 month ago 0 0 1 1

I am making this point based on my personal experience (I know, not evidence) . I have no idea if Emily is aware because she does not make this distinction in her posts. You might find this tiresome but this has been the only determinant of these models being useful to me.

1 month ago 0 0 0 0
Advertisement

Impossible for humans to have an unbiased opinion actually. Even the very act of seeing (same for other sensory experiences) is actually enormous processing through the visual cortex. So I would never ever make this claim. You should not believe me. Expert opinion is still *opinion*.

1 month ago 0 0 0 0

No question. But anecdotal experiences and heuristics are often the starting point of research programs. Emily's trichotomy of what she believes are the only three possible uses of LLM policies also seem anecdotal (I checked her scholar page and found no manuscript supporting her claim).

1 month ago 0 1 0 0

Regarding this point, here's the thing, the reason I personally found models before Opus 4.6 (Nov 25) useless is precisely because of hallucinations. But I rarely see any in 4.6 . Perhaps it is extended tool use (why make up a reference if you can search them with the web tool)

1 month ago 0 0 1 0

The jump from Sonnet 4.5 to the newer models Opus 4.5 and especially 4.6 seems enormous, unfathomable even from my personal perspective. If I had to hazard a guess, if Emily or any other AI safety researcher is not working with the newest models, they would get a skewed perspective of their potency.

1 month ago 0 0 1 0

I should be clear that this was not the case before this year. Models before Opus 4.5 (late Nov 2025 release) were not useful for my work, they were simply not performant enough. Something like Claude Code is very useful but not just an LLM but an elaborate harness around an LLM.

1 month ago 0 0 1 0

Personally my ADHD means I have been able to remove all stumbling blocks which in the past would stymie me for simple reasons and now are easily addressed. My work output/efficiency has increased 10X with I feel, greater and deeper understanding, even if you find that impossible to believe.

1 month ago 0 0 2 0

Ultimately this AGI discourse in my opinion is largely a waste of time. The only question is about *utility*. And I think these are incredibly useful for scientists and engineers which is why there are *so many* users. Every collaborator I have is already using Claude Code heavily.

1 month ago 0 0 1 0
Advertisement

I work mostly with Opus 4.6 and it seamlessly uses the RAG tool (guess but clearly) to pull relevant context from different conversations which is already impressive. With tool use, interpolating through the training set is a pretty powerful tool as I can attest to my work.

1 month ago 0 0 1 0

So now these models are an LLM policy that have to make *decisions* when to decide to use a python or perform web search or more directly run the newer visual designer (also reasoning on code - it is HTML + SVG trained specifically with RLHF for this given how good this is)

1 month ago 0 0 1 0

The most interesting feature is *tool use* - the most RLHF heavy concept in LLMs because a next-word trained objective LLM cannot simply do this. You need a policy to decide when to use tools automatically and these models are really good at this. Even newer tools you add with instructions.

1 month ago 0 0 1 0