在 LDA 中指定词汇输入

你可以使用Scikit学习计数矢量器为此from sklearn.feature_extraction.text import CountVectorizerfrom gensim import matutilsfrom gensim.models.ldamodel import LdaModeltext = ['computer time graph', 'survey response eps', 'human system computer','machinelearning is very hot topic','python win the race for simplicity as compared to other programming language']# suppose this are the word that you want to be used in your vocab vocabulary = ['machine','python','learning','human', 'system','hot','time']vect = CountVectorizer(vocabulary = vocabulary)x = vect.fit_transform(text)feature_name = vect.get_feature_names()# now you can use matutils helper function of gensimmodel = LdaModel(matutils.Sparse2Corpus(x),num_topic=3,id2word=dict([(i, s) for i, s in enumerate(feature_name)]))#printing the topic model.show_topics()#to see the vocab that use being used  print(vect.get_feature_names())  ['machine', 'python', 'learning', 'human', 'system', 'hot', 'time'] # you will get the feature that you want include

在 LDA 中指定词汇输入

2回答