北京邮电大学学报(社会科学版) ›› 2021, Vol. 23 ›› Issue (4): 28-41.doi: 10.19722/j.cnki.1008-7729.2021.0039

• 经济与管理 • 上一篇    下一篇

基于SGC-LDA的微博用户兴趣主题演化研究

傅魁(1977—),男,湖北武汉人,博士,副教授   

  1. 武汉理工大学 经济学院,湖北 武汉430070
  • 收稿日期:2021-03-22 出版日期:2021-08-30 发布日期:2021-09-07
  • 通讯作者: 傅魁(1977—),男,湖北武汉人,博士,副教授
  • 作者简介:傅魁(1977—),男,湖北武汉人,博士,副教授
  • 基金资助:
    教育部人文社会科学研究规划基金资助项目(17YJA870006)

Evolution of Microblog User Interest Topic Based on SGC-LDA 

  1. School of Economics, Wuhan University of Technology, Wuhan 430070, China
  • Received:2021-03-22 Online:2021-08-30 Published:2021-09-07

摘要: 针对传统的用户兴趣主题模型存在非动态、噪声性、计算复杂度高和兴趣演化分析维度单一等问题,基于滑动窗口技术,引入兴趣主题遗传因子保持主题连续性,并定义用于捕获通用语义和噪声干扰词的兴趣通用主题。提出了SGC-LDA(sliding-window, genetic factor and common topic-latent dirichlet allocation)用户兴趣主题模型,并根据该模型对数据集进行主题演化分析,从兴趣主题强度、兴趣主题状态和兴趣主题路径三个维度分析用户的兴趣偏好及演化规律。运用新浪微博语料文本进行实证分析,结果表明,SGC-LDA用户兴趣主题模型优于传统的LDA主题模型,可以准确描述用户兴趣演化规律,漏报率、误报率以及归一化开销均低于未进行主题关联过滤的基准(Baseline)方法,从而证明了模型的有效性。

关键词: 用户兴趣, 主题演化, 隐狄利克雷分配模型, 演化关系, 微博

Abstract: Aiming at the defects that traditional user interest models have such as non-dynamicity, noise, high computational complexity, and single dimension of interest evolution, and based on the sliding-window technique, the genetic factor of interest topics is introduced to maintain the continuity of topics, and the common topics of interest for capturing general semantics and noise interference words are defined. An SGC-LDA (Sliding-window, Genetic factor and Common topic-Latent Dirichlet Allocation) user interest topic model was proposed, and the subject evolution analysis was carried out by the proposed model, from the three dimensions of interest topic intensity, interest topic state, and interest topic path to analyze user interest preferences and evolution rules. Sina microblog corpus text is used for empirical analysis. The experimental results show that SGC-LDA user interest topic model is better than the traditional LDA topic model, and can characterize the evolution of user interests accurately. The model has lower non-response rate, false positives rate and normalized costs than the Baseline method without topic association filtering, and the validity of the model is proved.

Key words:  user interests, topic evolution, Latent Dirichlet Allocation (LDA) model, evolution relationship, microblog

中图分类号: