Building on Kyle's and Jenn's responses – it seems to me the analogy is: grammaticality is to BLiMP and SyntaxGym as truth is to COMPS and plausibility (albeit that's not binary) is to EWoK. So, to apply our framework to those datasets, perhaps one should swap truth/plausibility for grammaticality?
Posts by Roger Levy
Screenshot of a figure with two panels, labeled (a) and (b). The caption reads: "Figure 1: (a) Illustration of messages (left) and strings (right) in toy domain. Blue = grammatical strings. Red = ungrammatical strings. (b) Surprisal (negative log probability) assigned to toy strings by GPT-2."
New work to appear @ TACL!
Language models (LMs) are remarkably good at generating novel well-formed sentences, leading to claims that they have mastered grammar.
Yet they often assign higher probability to ungrammatical strings than to grammatical strings.
How can both things be true? 🧵👇
thinking of calling this "The Illusion Illusion"
(more examples below)
Helps to be a linguist!
Been listening a lot to Ella Jenkins the last couple of days. What wonderful music and performance!
Within the past two weeks I deleted apps for media companies owned by 67% of the world’s richest billionaires, and it felt great!
Not quite sure what you mean by a “complete” corpus. I do think the basic philosophical assumptions of frequentist probability are applicable to corpora, using the large-numbers-of-native-speakers thought experiment.
And productivity is a property of the asymptotic distribution, if I’m getting you.
If there were enough native speakers of the language living at once, you’d quickly get enough instances of the prefix for relative frequency estimation of the next token distribution. Too few humans are alive for this in practice, but that’s not a problem for theoretical validity of the construct!
You might be interested in this paper we did some time ago!
escholarship.org/content/qt69...
It supports your conjecture that, insofar as we think the “true distribution” is a valid theoretical construct (which I consider a highly defensible position), large-N Cloze would not give it to us.
Results of high stakes elections that happen only once every four years offer remarkable opportunities for overfitting theories of the electorate
The book The Patterns of Comics by Neil Cohn
Interior page from The Patterns of Comics by Neil Cohn
Back cover of The Patterns of Comics by Neil Cohn
It's my book's release day! The Patterns of Comics is now officially published, featuring an extended data-driven analysis of the structures used in 350+ comics from Asia, Europe, and North America analyzing diversity, regularity, and change over time www.visuallanguagelab.com/poc
screenshot of title and authors of paper + map with 18 colorful box callouts showing where datasets came from
GOOD MORNING BLUESKY!
Very excited about this new paper:
www.pnas.org/doi/10.1073/pnas.2300671120
Key Q: what predicts how much young kids (👶)talk?
How much 🗣 kids heard predicted how much 👶talked, but other factors, e.g. mom’s education, didn’t. #PsychSci #DevPsy 🗣💬
INCOMING SUMMARY🧵ALERT 1/14
#linguistics Bluesky: what are the best available quantitative measures of dialect/language mutual intelligibility? The more fine-grained, the better: I'm hoping to vividly illustrate at least one specific dialect continuum (e.g., the Romance languages of the Mediterranean coast)
Today in linguists are NOT KIDDING when we say that your capacity for language enables you to understand sentences that have never before been uttered in human history.
New postdoc opportunity to work jointly with @cantlonlab.bsky.social and me to understand cognition across species, age, and culture! cmu.wd5.myworkdayjobs.com/CMU/job/Pitt...
Absolutely, the Nature EiC has it completely backwards. Checking for errors and quality of data (and of math, code, and argumentation) is the most important work that reviewers can do.
Screenshot of portion of article linked to in post, where Nature EiC says that checking underlying data is not the job of peer review.
The quotes from Nature EiC Magdalena Skipper about whether journals should be checking for errors/data quality as part of peer review are quite surprising to me.
www.wsj.com/science/whats-wrong-with...
“This significant effect was found using a post hoc weighting procedure aligned with our overarching hypothesis”?!?
Snow geese fill the sky at sunset in Washington's Skagit Valley.
I am delighted to announce that the Department of Biology at the University of Washington is advertising for a tenure-track assistant professor position on the quantitative understanding of collective behavior.
I will be chairing the search; details are here: apply.interfolio.com/130336
I think it’s amazing that Cognitive Science gives recent PhDs $10K in UNRESTRICTED CASH…right when folks are broke, exhausted, moving town…and need it most.
It’s almost Glushko season!
cognitivesciencesociety.org/glushko-diss...
First post and big news - I am starting as an Assistant Professor in Psychology at Georgia Tech in Jan 2024!
www.language-intelligence-thought.net
In a new TiCS article, @emaliemcmahon.bsky.social and I review a growing body of behavioral, neural, and computational evidence that social interactions are automatically extracted by the human visual system:
tinyurl.com/nhh2dhxt
#PsychSciSky #NeuroSkyence
While the world has its eyes on the Middle East, democratic conditions in Indonesia are looking grim. The Supreme Court has overruled the Constitution in order to allow the sitting president's son to stand as Vice Presidential candidate with a disgraced general with a stained human rights record
I am reading PhD applications this year, with a special interest in students who would like to work on the topic of perceived danger. But open to all applicants who share some of my interests. Visit www.liulaboratory.org to see papers, lab values, and tips for application writing.
The Consumer Financial Protection Bureau (CFPB) is hiring a section chief for the psych/”behavioral” section and I'd love to see some CogSci representation in there! My brother works there (trained as an economist) and it's an incredible gig doing research in the public interest.
Santa Fe Institute now has a Blue Sky account: @sfiscience.bsky.social
A new cross-linguistic study on demonstratives by a team of psychologists and linguists: "Commonalities and differences across languages in spatial communication can be understood in terms of universal constraints on action shaping spatial language and cognition." www.nature.com/articles/s41...
New paper out!
"Large language models show human-like content biases in transmission chain experiments"
#CulturalEvolution #cssky 🧪
www.pnas.org/doi/10.1073/...
new work just dropped, see @stephan-meylan.bsky.social's "thread" below:)
#DevPsych #CogPsych #PsychSciSky #CogSciSky
Thrilled at publication of
@stephan-meylan.bsky.social's "How adults understand what young children say", featuring Bayesian noisy-channel inference, LLMs, & child speech datasets!
TL;DR: prior expectations of what kids *want to say* is crucial. (Knowing how kids mispronounce words is too.)