Adaptive Mixed Component LDA for Low Resource Topic Modeling

Suzanna Sia, Kevin Duh

Document analysis including Text Categorization and Topic Models Long paper Paper

Zoom-5B: Apr 22, Zoom-5B: Apr 22 (12:00-13:00 UTC) [Join Zoom Meeting]
Gather-2A: Apr 22, Gather-2A: Apr 22 (13:00-15:00 UTC) [Join Gather Meeting]

You can open the pre-recorded video in separate windows.

Abstract: Probabilistic topic models in low data resource scenarios are faced with less reliable estimates due to sparsity of discrete word co-occurrence counts, and do not have the luxury of retraining word or topic embeddings using neural methods. In this challenging resource constrained setting, we explore mixture models which interpolate between the discrete and continuous topic-word distributions that utilise pre-trained embeddings to improve topic coherence. We introduce an automatic trade-off between the discrete and continuous representations via an adaptive mixture coefficient, which places greater weight on the discrete representation when the corpus statistics are more reliable. The adaptive mixture coefficient takes into account global corpus statistics, and the uncertainty in each topic’s continuous distributions. Our approach outperforms the fully discrete, fully continuous, and static mixture model on topic coherence in low resource settings. We additionally demonstrate the generalisability of our method by extending it to handle multilingual document collections.
NOTE: Video may display a random order of authors. Correct author list is at the top of this page.

Connected Papers in EACL2021

Similar Papers

Event-Driven News Stream Clustering using Entity-Aware Contextual Embeddings
Kailash Karthik Saravanakumar, Miguel Ballesteros, Muthu Kumar Chandrasekaran, Kathleen McKeown,
Generative Text Modeling through Short Run Inference
Bo Pang, Erik Nijkamp, Tian Han, Ying Nian Wu,