猿问

在numpy python中从稀疏矩阵生成密集矩阵

我有一个Sqlite数据库,其中包含以下类型的架构:


termcount(doc_num, term , count)

该表包含术语及其在文档中的各自计数。喜欢


(doc1 , term1 ,12)

(doc1, term 22, 2)

.

.

(docn,term1 , 10)

该矩阵可以视为稀疏矩阵,因为每个文档都包含很少的具有非零值的项。


我将如何使用numpy从稀疏矩阵创建密集矩阵,因为我必须使用余弦相似度来计算文档之间的相似度。


这个密集的矩阵看起来像一个表格,第一列为docid,所有术语将列为第一行,其余单元格将包含计数。


鸿蒙传说
浏览 367回答 2
2回答

沧海一幻觉

&nbsp;from scipy.sparse import csr_matrix&nbsp;A = csr_matrix([[1,0,2],[0,3,0]])&nbsp;>>>A&nbsp;<2x3 sparse matrix of type '<type 'numpy.int64'>'&nbsp; &nbsp; with 3 stored elements in Compressed Sparse Row format>&nbsp;>>> A.todense()&nbsp; &nbsp;matrix([[1, 0, 2],&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;[0, 3, 0]])&nbsp;>>> A.toarray()&nbsp; &nbsp; &nbsp; array([[1, 0, 2],&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; [0, 3, 0]])这是如何将稀疏矩阵转换为从scipy提取的密集矩阵的示例

小怪兽爱吃肉

我用熊猫解决了这个问题。因为我们要保留文档ID和术语ID。from pandas import DataFrame&nbsp;# A sparse matrix in dictionary form (can be a SQLite database). Tuples contains doc_id&nbsp; &nbsp; &nbsp; &nbsp; and term_id.&nbsp;doc_term_dict={('d1','t1'):12, ('d2','t3'):10, ('d3','t2'):5}#extract all unique documents and terms ids and intialize a empty dataframe.rows = set([d for (d,t) in doc_term_dict.keys()])&nbsp;&nbsp;cols = set([t for (d,t) in doc_term_dict.keys()])df = DataFrame(index = rows, columns = cols )df = df.fillna(0)#assign all nonzero values in dataframefor key, value in doc_term_dict.items():&nbsp; &nbsp; df[key[1]][key[0]] = value&nbsp; &nbsp;print df输出:&nbsp; &nbsp; t2&nbsp; t3&nbsp; t1d2&nbsp; 0&nbsp; 10&nbsp; &nbsp;0d3&nbsp; 5&nbsp; &nbsp;0&nbsp; &nbsp;0d1&nbsp; 0&nbsp; &nbsp;0&nbsp; 12
随时随地看视频慕课网APP

相关分类

Python
我要回答