Advertisement · 728 × 90

Posts by michael ginn

Really excited about this work w/ my long-time collaborators at Boulder!

We address limitations in existing morphosyntactic annotation systems for digitally under-resourced languages and show how *jointly* predicting morphological segmentation helps with glossing performance

2 weeks ago 6 3 0 0
https://github.com/lecs-lab/polygloss

The models are available right now on HuggingFace! See usage instructions on our GitHub: t.co/cqRgURAKIA

2 weeks ago 1 0 0 0

Check it out here: arxiv.org/pdf/2601.10925

This work is the result of an ongoing collaboration between @lecslab.bsky.social and @ltiatcmu.bsky.social. Many thanks to my collaborators including @alexispalmer.bsky.social @lindiatjuatja.bsky.social @gneubig.bsky.social, and others!

2 weeks ago 1 0 1 0
Post image

Furthermore, we show that our model can quickly adapt to a new language via LoRA with minimal examples, making it a practical choice for real-world documentation.

2 weeks ago 0 0 1 0
Post image

Our new model jointly predicts the segmentation and glosses, optimizing for accuracy on both tasks and alignment between them. Predictions are not only interpretable, but more accurate than our prior model!

2 weeks ago 1 0 1 0

Because the model predicted glosses without exposing the underlying morphological segmentation, the predictions were uninterpretable and difficult to trust.

2 weeks ago 0 0 1 0
Post image

Excited to announce that the PolyGloss paper has been accepted to @aclmeeting.bsky.social!

Previously, we trained models to help in endangered language documentation workflows by automatically predicting interlinear glosses. But real-world user studies revealed crucial issues...

2 weeks ago 5 3 1 1
Preview
Is linguistically-motivated data augmentation worth it? Data augmentation, a widely-employed technique for addressing data scarcity, involves generating synthetic data examples which are then used to augment available training data. Researchers have seen s...

Find out in our new paper, which my colleague Ray Groshan will present at ACL!!

arxiv.org/abs/2506.03593

10 months ago 2 0 0 0

“Linguistically-motivated” techniques for data augmentation sounds good on paper, but is it worth the cost?

10 months ago 0 0 1 0
Advertisement
Preview
Measuring Contextual Informativeness in Child-Directed Text To address an important gap in creating children's stories for vocabulary enrichment, we investigate the automatic evaluation of how well stories convey the semantics of target vocabulary words, a tas...

Excited to be presenting my work with @teaywright.bsky.social at #COLING2025 next week in Abu Dhabi! Find us in poster session 6/E on Jan 22nd (11 AM in the atrium).

Paper: arxiv.org/abs/2412.17427

1 year ago 10 3 1 0
Post image
1 year ago 1 0 0 0

Well isn’t the idea that the entire layer defines a high dimensional space, where each neuron is a dimension?

1 year ago 0 0 1 0

There’s no conspiracy to make tech products worse by AI in things, AI is just very immediately and clearly productivity enhancing to the people making tech products in a way that it isn’t necessarily to the people using them.

1 year ago 110 7 9 0

Randomly stumbled on an arxiv paper where im pretty sure the listed affiliations are false, what would you even get out of that?

1 year ago 0 0 0 0

Been reading a lot of old-school finite-state automata papers for a project

It is so refreshing to read an interesting, colorfully-written paper that isn’t hyperoptimized for reviewer preferences

1 year ago 1 0 0 0

Probably doesn’t help that there is effectively an online cult promoting all of this

1 year ago 2 0 0 0

I have very mixed feelings on the current era in tech—I started a PhD because I thought LLMs were pretty cool, but I absolutely cannot stand the disingenuous hype, insane competitiveness, and slop features that have since come with them

1 year ago 8 0 2 0

It’s interesting how they describe patches of bytes that are determined by changes in entropy, without making any reference to morphology…Zellig Harris did basically the same thing 50 years ago

1 year ago 1 0 0 0
Preview
Boosting the Capabilities of Compact Models in Low-Data Contexts with Large Language Models and Retrieval-Augmented Generation The data and compute requirements of current language modeling technology pose challenges for the processing and analysis of low-resource languages. Declarative linguistic knowledge has the potential ...

Can RAG+LLM systems help boost small models for rare languages?

Find out in “Boosting the Capabilities of Compact Models in Low-Data Contexts with Large Language Models and Retrieval-Augmented Generation” by Bhargav Shandilya and @alexispalmer.bsky.social

arxiv.org/abs/2410.00387

1 year ago 1 1 1 0
Advertisement

Also shout-out to the morphology reviewers!

1 year ago 1 0 0 0

Adding my love letter to

arxiv.org/pdf/2304.01315

Empirical Design in Reinforcement Learning
by
Andrew Patterson, Samuel Neumann, Martha White, Adam White

JMLR 25 (2024) 1-63
#ReinforcementLearning

These aren’t the heroes we deserve, but they are the heroes we need.

1 year ago 211 47 7 6

I’m a big proponent of an accumulating reviewer score (complementary to h-index). I think people would absolutely care about optimizing it even with no concrete incentive.

1 year ago 6 0 1 0

I feel like reviewers often expect short papers to be long papers condensed into 4 pages. They should really be a venue to showcase focused and incremental work.

1 year ago 4 1 0 0

🙋‍♂️

1 year ago 1 0 0 0

Interested in ML open source? There’s a great list for you

1 year ago 3 0 0 0

Python typing is great until you want to use any package ever

1 year ago 0 0 0 0

Hi, unfortunately the pack is now full, however @datatherapist.bsky.social started a third one! go.bsky.app/CUuio7g

1 year ago 1 0 0 0

Hi, unfortunately the pack is now full, however @datatherapist.bsky.social started a third one! go.bsky.app/CUuio7g

1 year ago 2 0 0 0
Advertisement

Hi, unfortunately the pack is now full, however @datatherapist.bsky.social started a third one! go.bsky.app/CUuio7g

1 year ago 1 0 0 0

Hi, unfortunately the pack is now full, however @datatherapist.bsky.social started a third one! go.bsky.app/CUuio7g

1 year ago 1 0 0 0