This paper introduces the ldagibbs command which implements Latent Dirichlet Allocation in Stata. Latent Dirichlet Allocation is the most popular machine learning topic model. Topic models automatically cluster text documents into a user chosen number of topics. Latent Dirichlet Allocation represents each document as a probability distribution over topics, and each topic as a probability distribution over words. Thereby, Latent Dirichlet Allocation provides a way to analyze the content of large unclassified text data and an alternative to predefined document classifications.

Ldagibbs: a command for topic modeling in Stata using Latent Dirichlet Allocation

Schwarz, Carlo
2018

Abstract

This paper introduces the ldagibbs command which implements Latent Dirichlet Allocation in Stata. Latent Dirichlet Allocation is the most popular machine learning topic model. Topic models automatically cluster text documents into a user chosen number of topics. Latent Dirichlet Allocation represents each document as a probability distribution over topics, and each topic as a probability distribution over words. Thereby, Latent Dirichlet Allocation provides a way to analyze the content of large unclassified text data and an alternative to predefined document classifications.
File in questo prodotto:
File Dimensione Formato  
lda_stata.pdf

non disponibili

Descrizione: articolo
Tipologia: Pdf editoriale (Publisher's layout)
Licenza: NON PUBBLICO - Accesso privato/ristretto
Dimensione 109.91 kB
Formato Adobe PDF
109.91 kB Adobe PDF   Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11565/4032407
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 34
  • ???jsp.display-item.citation.isi??? 31
social impact