Really excited about this work w/ my long-time collaborators at Boulder!
We address limitations in existing morphosyntactic annotation systems for digitally under-resourced languages and show how *jointly* predicting morphological segmentation helps with glossing performance
Posts by michael ginn
The models are available right now on HuggingFace! See usage instructions on our GitHub: t.co/cqRgURAKIA
Check it out here: arxiv.org/pdf/2601.10925
This work is the result of an ongoing collaboration between @lecslab.bsky.social and @ltiatcmu.bsky.social. Many thanks to my collaborators including @alexispalmer.bsky.social @lindiatjuatja.bsky.social @gneubig.bsky.social, and others!
Furthermore, we show that our model can quickly adapt to a new language via LoRA with minimal examples, making it a practical choice for real-world documentation.
Our new model jointly predicts the segmentation and glosses, optimizing for accuracy on both tasks and alignment between them. Predictions are not only interpretable, but more accurate than our prior model!
Because the model predicted glosses without exposing the underlying morphological segmentation, the predictions were uninterpretable and difficult to trust.
Excited to announce that the PolyGloss paper has been accepted to @aclmeeting.bsky.social!
Previously, we trained models to help in endangered language documentation workflows by automatically predicting interlinear glosses. But real-world user studies revealed crucial issues...
Find out in our new paper, which my colleague Ray Groshan will present at ACL!!
arxiv.org/abs/2506.03593
“Linguistically-motivated” techniques for data augmentation sounds good on paper, but is it worth the cost?
Excited to be presenting my work with @teaywright.bsky.social at #COLING2025 next week in Abu Dhabi! Find us in poster session 6/E on Jan 22nd (11 AM in the atrium).
Paper: arxiv.org/abs/2412.17427
Well isn’t the idea that the entire layer defines a high dimensional space, where each neuron is a dimension?
There’s no conspiracy to make tech products worse by AI in things, AI is just very immediately and clearly productivity enhancing to the people making tech products in a way that it isn’t necessarily to the people using them.
Randomly stumbled on an arxiv paper where im pretty sure the listed affiliations are false, what would you even get out of that?
Been reading a lot of old-school finite-state automata papers for a project
It is so refreshing to read an interesting, colorfully-written paper that isn’t hyperoptimized for reviewer preferences
Probably doesn’t help that there is effectively an online cult promoting all of this
I have very mixed feelings on the current era in tech—I started a PhD because I thought LLMs were pretty cool, but I absolutely cannot stand the disingenuous hype, insane competitiveness, and slop features that have since come with them
It’s interesting how they describe patches of bytes that are determined by changes in entropy, without making any reference to morphology…Zellig Harris did basically the same thing 50 years ago
Can RAG+LLM systems help boost small models for rare languages?
Find out in “Boosting the Capabilities of Compact Models in Low-Data Contexts with Large Language Models and Retrieval-Augmented Generation” by Bhargav Shandilya and @alexispalmer.bsky.social
arxiv.org/abs/2410.00387
Also shout-out to the morphology reviewers!
Adding my love letter to
arxiv.org/pdf/2304.01315
Empirical Design in Reinforcement Learning
by
Andrew Patterson, Samuel Neumann, Martha White, Adam White
JMLR 25 (2024) 1-63
#ReinforcementLearning
These aren’t the heroes we deserve, but they are the heroes we need.
I’m a big proponent of an accumulating reviewer score (complementary to h-index). I think people would absolutely care about optimizing it even with no concrete incentive.
I feel like reviewers often expect short papers to be long papers condensed into 4 pages. They should really be a venue to showcase focused and incremental work.
🙋♂️
Interested in ML open source? There’s a great list for you
Python typing is great until you want to use any package ever
Hi, unfortunately the pack is now full, however @datatherapist.bsky.social started a third one! go.bsky.app/CUuio7g
Hi, unfortunately the pack is now full, however @datatherapist.bsky.social started a third one! go.bsky.app/CUuio7g
Hi, unfortunately the pack is now full, however @datatherapist.bsky.social started a third one! go.bsky.app/CUuio7g
Hi, unfortunately the pack is now full, however @datatherapist.bsky.social started a third one! go.bsky.app/CUuio7g