yippeeee ! bravo ! Avoues ils t'ont filé le diplôme quand ils t'ont vu dans cet accoutrement
Posts by Jean Barré
Google dropped 4 different Gemma open-weight models! I'm most excited that they're finally adopting a standard Apache 2.0 open source license.
huggingface.co/collections/...
#jobklaxon
PhD Fellowships available at @psl-univ.bsky.social!
Come & work w/us in CultureLab (w/ @jbcamps.bsky.social @oliviermorin.bsky.social et al not bsky).
Deadline 31/05
Starting 01/10
Duration 3 yrs (fixed-term employment with benefits)
Salary: sad.
www.culturelab.psl.eu/en/news-cult...
#jobklaxon
10 postdocs available at @psl-univ.bsky.social in #AI.
Working language: English.
Btw. #digitalhumanities also happen under the AI hype these days, so get in touch and walk by this very building on your way to work. Join me in being #touristatwork #juniorProfInParis
Out in Evolutionary Human Sciences! With @mikekestemont.bsky.social, @jbcamps.bsky.social, @remcosleiderink.bsky.social & Anne Chao
New work on unseen species models for cult heritage to the question: how many stories were _shared_ between medieval French and Dutch literature?
lnkd.in/exyAWtir
I'm on a 38(!)-author paper just published in Frontiers in Artificial Intelligence, "Computational hermeneutics: evaluating generative AI as a cultural technology". We splice Schleiermacher and hermeneutic theory into AI debates, arguing AI are "context machines".
www.frontiersin.org/journals/art...
From time to time I mutter about a secret project that involves benchmarks and historical language models. Here's a formal announcement of the Schmidt Sciences grant. Other PIs include @dmimno.bsky.social , @lauraknelson.bsky.social, @andrewpiper.bsky.social, and @mattwilkens.bsky.social. And +
What is the relationship between memorization and generalization in AI? Is there a fundamental tradeoff? In infinitefaculty.substack.com/p/memorizati... I’ve reviewed some of the evolving perspectives on memorization & generalization in machine learning, from classic perspectives through LLMs.
A bit ironically, we end up learning as much about the researchers (and their implicit representations) as about the characters themselves.
Kudos to @antoine-bourgois.bsky.social , first author & 1st-year PhD student 👏
📂 Data + code → github.com/lattice-8094/fictional-character-ontology
This flips the similarity question on its head.
It's no longer: "Are these two characters similar?"
But rather: "Along which dimensions did the scholars who designed this benchmark implicitly define similarity between these two characters?"
Box plot showing accuracy scores for all combinations of 1 to 17 ontological character attribute classes. Three trend lines track maximum (green), mean (red), and minimum (orange) accuracy. The key finding: maximum accuracy peaks at ~0.96 with just 4–5 carefully selected classes, while using all 17 classes simultaneously yields no improvement over the mean (~0.64). A small, well-chosen subset of ontological dimensions outperforms the full representation — more is not better.
We then built on @dbamman.bsky.social et al.'s (2014) triplets protocol and introduce CharaSim-fr, a brand new benchmark for characters similarity
Key method: we exhaustively compute all possible mixes of concatenated ontological classes (131,054 combinations!) to find the optimal similarity signal
So we built an ontology of characterization w/ 17 classes grounded in narratological theory: actions, emotions, personality traits, relations, cognition, objects, body parts, ++
The goal: stop treating characters as bags of features and start asking along which dimensions they resemble each other.
≠ Rosanette: less refined, more eroticized, defined largely through dialogue
≠ Mme Arnoux: the central figure — the other two are satellites who exist mainly by contrast with her
≠ Mme Dambreuse: far more effaced, valued by Frédéric mainly for the social access she provides — little interiority
The story starts with an argument among co-authors about three women in Flaubert's L'Éducation sentimentale: Mme Arnoux, Rosanette, Mme Dambreuse.
The question: who is the most dissimilar in this triplet?
Three defensible answers
That is exactly where the tears began
Sweat, because we tried to grasp what it actually means for two characters to be "similar" measured across 100,000+ characters.
The standard approach? Character embeddings, cosine similarity, done.
We were so naive. So wrong.
New article! "Toward an Ontological Representation of Fictional Characters" by @antoine-bourgois.bsky.social, me, @oseminck.bsky.social & @tpoibeau.bsky.social
doi.org/10.1017/chr....
Nothing fancy here — only sweat & tears. 🧵
📢 Postulez aux masters en #humanitésnumériques de l'École des chartes - @psl-univ.bsky.social .
L’École propose deux masters qui forment au traitement des sources (objets, textes, images) par les technologies numériques.
Candidatures : 17 fév.-16 mars.
▶️ En savoir plus : https://urls.fr/tr9XN4
ah oui c'est du rapide !
This call is currently open for a Humanistica-satellite event, that might interest people in computational humanities (and not only). It is supported by CultureLab and welcomes long papers as well as lightning talks and posters.
You like Automatic Text Recognition (ATR/OCR/HTR) ?
You like challenges ?
Well, we open a competition for ATR/OCR of medieval manuscripts, in cross-lingual & diachronic settings for the Latin script.
📆 20/01 Registration
📆 21/03 Test set released
📆 3/04 Deadline for results
Link cmmhwr26.inria.fr
un petit tuning d'un BERT sur tes catégories ? mais il te faudra + d'exemples
sinon l'option frugale c'est diffusion de tes 1000 exemples avec un SVM sur tes embeddings (ya 6 mois j'étais encore sur les modèles bge-m3 en multi-lingue c'était top)
New Publication: "Understanding Conversational AI: Philosophy, Ethics, and Social Impact of Large Language Models" (270 pages, Ubiquity Press, open access). Feel free to read it and share it widely! www.ubiquitypress.com/books/m/10.5...
Amazing papers at #CHR2025; particularly enjoying the computational literary studies. An observation: questions about genre as a confounding factor seem to keep coming up. I do wonder if (and I'm also guilty of this) CLS can fixate on the x-axis of history and we ought to give genre more attention.
@jbarre.bsky.social, @oseminck.bsky.social, Antoine Bourgois, and @tpoibeau.bsky.social built a detective detector, tracing the different archetypes in French detective fiction #CHR2025
Thx @oseminck.bsky.social for the pic!
Yuri and I standing by a literal cannon while we talk about canonicity in the Luxembourg city casemates
Luxembourg is a good place to talk about Canonicity with Yuri #chr2025
Christmas comes earlier every year !
I added a new post on my research blog last week. I wanted to react to a post from Dan Cohen that I've seen circulating on BlueSky last week about Gemini 3, and figured I would add my critical 2 cents to the mix!
alix-tz.github.io/phd/posts/025/
New article in #JCLS 4(1)! 🎉
Visser Solissa, van Cranenburgh & @fpianz.bsky.social present a model for detecting syuzhet—the ordering and disclosure of events that shape a narrative—and formalize event annotation in fiction across multiple languages.
#CCLS25 #ComputationalNarratology