This is an excellent summary of the US War on Iran current situation.
I’m an existentialist and borderline absurdist and even I struggle with the current moment.
This summary is just fantastic.
Posts by tal yarkoni
how do i get from here
to the s-tier
🤚
like i can easily see every human author thinking most other authors just don't get what's cool or important about *their* work.
but i can also see it going the other way, where everyone converges and the llms are out there doing their own thing.
the thought experiment i'd love to see turned real is: imagine you had 100 professional authors do this exercise for one another's books. then you had SoTA AIs do all 100.
would people consistently think the AIs are worse than the humans? i genuinely don't know... but it isn't obvious they would
Claude has strong opinions about what helps: claude.ai/share/e2cf6f...
whether they're actually correct, i don't know
what happens if you try explicitly telling them they should be very specific, and feel free to make choices about content? i'm curious whether, viz your original post, it's really that you need to specify a huge amount of detail in the prompt, vs. finding short-cuts that "license" the model to do it
but if you look at how writers in general talk about other writers' work, it also has a lot of the same flavor... and i worry that people conflate "the llm can't generate good writing" with "actually i think almost no one else can generate writing i'd approve of, in this narrow context"
i guess this is where i kind of feel like the subjective and personal nature of it becomes really huge. i also have the experience that the models are bad at mimicking *my* writing. and yet i feel like they're pretty good at mimicking the style of other writers i like—where i'm not as invested?
interesting! how much of this do you think is llm-specific? meaning, do you think if you took a bunch of 100 human non-fiction authors and asked them to do the same task, you'd be happy with most of the results?
in a sense i think this maybe gets at what a remaining role of domain expertise is (for now). a lot of variance in writing style and creativity *is* already in the model weights... but someone who's read a lot is at a huge advantage in terms of eliciting interesting results from a compact prompt
i was recently working on a product where the stakeholders were like "this writing is AI slop", so i just changed the prompt to be much more stylized, and then the feedback turned into "it's too stylized and opinionated", which i think highlights the subjective nature of what "good" writing is
i'm struck by how often i see people complain about how generic the llm writing style is (not you, of course), and it's clear the critic hasn't even tried to direct the style in any way. of course it's going to give you the lowest common denominator--that's the smart thing to do!
i think that's true (and again generally so) in the sense that, absent a signal from the user, mode collapse is the sensible strategy. but at paragraph or short story level at least, you don't need a long prompt... you can just say "in the manner of Kafka" and get radically different results
i find it more natural to just mentally separate raw intelligence from concrete application (even though, in the limit, the distinction will collapse). a model can be very smart, but it still needs some organizing skills and structures to write a novel, because that's a very hard, specific thing
i guess, but you could say the same for almost any really difficult task? like, the model won't just solve clay problems for you (even if it could, with the perfect prompting), because it's trying to stay close to the prompt. it's true, but feels kind of empty?
from a (strictly) data standpoint i would guess that writing quality is actually one of the easier things to solve for, since most of the training data *is* writing. but distinguishing good from bad in a way that respects people's differing aesthetic preferences seems really hard
not really, a lot of the performance gains are now driven by RL, and the typical setup there is either human experts paid to generate good examples, or the models themselves generating candidates that humans (or other models) then evaluate (but also, the models train on huge book corpora)
crazy that the latest models from the big labs have all been minor version bumps on paper despite huge improvements in benchmark performance and qualitative feel.
hard to imagine the labs releasing major versions that feel incremental at this point, which is... terrifying?
buuuut even then, i'm skeptical that any particular AI, short of genuine ASI, would ever be *widely* received as a good writer, for the same reason that even the most popular human writers are usually appreciated by only a small subset of people. it's just inherently subjective.
which is just to say that, even if base model development halted in its current state, we probably *will* still get a good AI novelist in the next couple of years. it's just that it's not trivial to build the right harness, and not a lot of resources are being thrown at that particular problem.
of the 3, the harness is probably by far the easiest way to keep a narrative on the rails as more information is introduced. this is pretty much how human writers operate too! human novelists don't sit down and spew out 100K words linearly. it's an enormous process of iteration and refinement.
3. it's arguably more of a harness problem than a base model problem. i think @moultano.bsky.social's point that information needs to come from somewhere is correct, buuut... it doesn't have to be in the prompt! it could be in the weights (hard for reason (2)), or it could be in the harness.
one way to think about it is that the manifold you need to learn in the latent space is much larger, because there are many more ways to be good. you probably need an incredible amount of feedback from expert judges, and you have the standard chaining problem, but with very few intermediate labels.
2. for reasons related to the above, learning to produce coherent and high-quality long form writing via pretraining or RL is probably much harder than learning to generate code or even do math. lack of verifiability, and the inherent subjectivity of writing, really hurts you here.
whereas if AI writes a broadly coherent 120K-word novel, nobody is impressed unless they also like the quality of individual paragraphs. so an AI writer has to be able to write like almost *every* human writer to be "good", and it has to produce high quality writing fractally, *at every scale*.
1. the problem is just way harder. good writing is mostly an aesthetic judgment, unlike code or science, where correctness is often verifiable. people are much less forgiving! if AI writes 120K words of code that *works*, devs are impressed even if they think it's insecure spaghetti code.
this is interesting, but feels at best incomplete. i think there are at least 3 separate reasons we haven't yet seen consistently good writing from AI, at least in long form (microfiction is arguably mostly solved):
🧵
i do think that there's ultimately an empirical question here, which is "will these kinds of errors prevent us from using LLMs to solve really hard problems", and it sounds like we have different predictions there. which is fine! i see no point in arguing about that, we can just wait and see.
and re: your question about what i'm saying more broadly, it's that i think you're oversimplifying the dominant perspectives on LLMs (on both sides). most "boosters" don't think LLMs no longer make errors, or that their ultimate utility depends on having guarantees about correctness.