his.sePublikationer
Ändra sökning
RefereraExporteraLänk till posten
Permanent länk

Direktlänk
Referera
Referensformat
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Annat format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annat språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf
Evaluation of the dirichlet process multinomial mixture model for short-text topic modeling
Högskolan i Skövde, Institutionen för informationsteknologi. Högskolan i Skövde, Forskningscentrum för Informationsteknologi. (Skövde Artificial Intelligence Lab (SAIL))ORCID-id: 0000-0003-2973-3112
Campus Chapecó, Federal University of Fronteira sul, Chapecó, Brazil.
Högskolan i Skövde, Institutionen för informationsteknologi. Högskolan i Skövde, Forskningscentrum för Informationsteknologi. (Skövde Artificial Intelligence Lab (SAIL))ORCID-id: 0000-0001-7106-0025
Högskolan i Skövde, Institutionen för informationsteknologi. Högskolan i Skövde, Forskningscentrum för Informationsteknologi. (Skövde Artificial Intelligence Lab (SAIL))
2018 (Engelska)Ingår i: Proceedings - 6th International Symposium on Computational and Business Intelligence, ISCBI 2018, USA: Institute of Electrical and Electronics Engineers (IEEE), 2018, s. 79-83, artikel-id 8638311Konferensbidrag, Publicerat paper (Refereegranskat)
Abstract [en]

Fast-moving trends, both in society and in highly competitive business areas, call for effective methods for automatic analysis. The availability of fast-moving sources in the form of short texts, such as social media and blogs, allows aggregation from a vast number of text sources, for an up to date view of trends and business insights. Topic modeling is established as an approach for analysis of large amounts of texts, but the scarcity of statistical information in short texts is considered to be a major problem for obtaining reliable topics from traditional models such as LDA. A range of different specialized topic models have been proposed, but a majority of these approaches rely on rather strong parametric assumptions, such as setting a fixed number of topics. In contrast, recent advances in the field of Bayesian non-parametrics suggest the Dirichlet process as a method that, given certain hyper-parameters, can self-adapt to the number of topics of the data at hand. We perform an empirical evaluation of the Dirichlet process multinomial (unigram) mixture model against several parametric topic models, initialized with different number of topics. The resulting models are evaluated, using both direct and indirect measures that have been found to correlate well with human topic rankings. We show that the Dirichlet Process Multinomial Mixture model is a viable option for short text topic modeling since it on average performs better, or nearly as good, compared to the parametric alternatives, while reducing parameter setting requirements and thereby eliminates the need of expensive preprocessing. 

Ort, förlag, år, upplaga, sidor
USA: Institute of Electrical and Electronics Engineers (IEEE), 2018. s. 79-83, artikel-id 8638311
Nyckelord [en]
Bayesian-nonparametrics, Dirichlet-process, short-text, text-analysis, topic-modeling, Information analysis, Bayesian nonparametrics, Dirichlet process, Short texts, Text analysis, Topic Modeling, Mixtures
Nationell ämneskategori
Datavetenskap (datalogi)
Forskningsämne
Skövde Artificial Intelligence Lab (SAIL)
Identifikatorer
URN: urn:nbn:se:his:diva-16747DOI: 10.1109/ISCBI.2018.00025ISI: 000462379700015Scopus ID: 2-s2.0-85063024705ISBN: 978-1-5386-9450-3 (digital)ISBN: 978-1-5386-9451-0 (tryckt)OAI: oai:DiVA.org:his-16747DiVA, id: diva2:1302774
Konferens
6th International Symposium on Computational and Business Intelligence (ISCBI), 27-29 August 2018, Basel, Switzerland
Tillgänglig från: 2019-04-05 Skapad: 2019-04-05 Senast uppdaterad: 2019-09-30Bibliografiskt granskad

Open Access i DiVA

Fulltext saknas i DiVA

Övriga länkar

Förlagets fulltextScopus

Personposter BETA

Karlsson, AlexanderMathiason, GunnarBae, Juhee

Sök vidare i DiVA

Av författaren/redaktören
Karlsson, AlexanderMathiason, GunnarBae, Juhee
Av organisationen
Institutionen för informationsteknologiForskningscentrum för Informationsteknologi
Datavetenskap (datalogi)

Sök vidare utanför DiVA

GoogleGoogle Scholar

doi
isbn
urn-nbn

Altmetricpoäng

doi
isbn
urn-nbn
Totalt: 109 träffar
RefereraExporteraLänk till posten
Permanent länk

Direktlänk
Referera
Referensformat
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Annat format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annat språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf