That's interesting, intuitively I'd think that you need a certain document length for LDA to reliably capture repeated word co-occurrences as signals for underlying topics... It would be quite useful to do a systematic comparison that doesn't have all the flaws of the above mentioned paper!
Posts by Armin Pournaki
Also, I've never quite seen the value of stemming for topic modeling, but I'd argue that preprocessing in general is an important part of any method. That's why I wouldn't necessarily copy every preprocessing step when comparing methods, especially when the latter are quite different.
I suspect that a good number of citations stem from the fact that the paper "confirms" a common experience in the CSS community: that embedding-based topic models tend to produce more interpretable topics than LDA _on very short texts_.
@some4dem.bsky.social researchers @eckolb.bsky.social and @pournaki.bsky.social presenting their work at #ic2s2 on measuring political realignment in Switzerland and extracting conflicting narratives from polarized debates on social media.
Looking forward to #ic2s2 where I'll present some of our latest work from the @some4dem.bsky.social project:
- Conflicting narratives and polarization (Tue in Pol.Narratives II 2:30pm)
- A political cartography of news sharing (Tue, poster)
- Issue alignment and polarization on Twitter (Thu, poster)
Thanks for sharing!