行索引超过了 scipy csr_matrix 的矩阵维度

我是 python 和 pandas 的新手,我有以下问题


我有一个数据集


df = pd.read_csv('/home/nikoscha/Documents/ThesisR/dataset.csv', names=['response_nn','event','user'])

我正在尝试使用以下代码创建一个 csr_matrix


# Create lists of all events, users adfnd respones

events = list(np.sort(df.event_id.unique()))

users = list(np.sort(df.user_id.unique()))

responses = list(df.responses)


# Get the rows and columns for our new matrix

rows = df.user_id.astype(float)

cols = df.event_id.astype(float)


# Contruct a sparse matrix for our users and items containing number of plays

data_sparse = sp.csr_matrix((responses, (rows, cols)), shape=(len(users), len(events)))

上面的代码有效。但是当我得到一个训练数据集时


mask = np.random.rand(len(df)) < 0.5

df = df[mask]

df = df.reset_index() 

df = df.drop(['index'], axis=1)

或者只是删除特定的行


df = df[df.responses != 2]

并尝试构造稀疏矩阵我得到以下错误


ValueError:行索引超出矩阵维度


谁能解释我为什么?先感谢您


qq_遁去的一_1
浏览 1175回答 1
1回答

蝴蝶刀刀

正如 scipy 的文档中所解释的,当 csr_matrix 以这种形式初始化时:csr_matrix((数据, (row_ind, col_ind)), [shape=(M, N)])在 scipy.sparse.csr.py 中:csr_matrix((data, (row_ind, col_ind)), [shape=(M, N)])&nbsp;&nbsp;&nbsp; &nbsp; &nbsp; &nbsp; where `data`, `row_ind` and `col_ind` satisfy the&nbsp;&nbsp;&nbsp; &nbsp; &nbsp; &nbsp; relationship `a[row_ind[k], col_ind[k]] = data[k]`.&nbsp;&nbsp;当 csr init 时,它会检查 row_ind.max() 和 M 之间的关系。同样在 scipy.sparse.coo.py 中:if self.row.max() >= self.shape[0]:&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; raise ValueError('row index exceeds matrix dimensions')&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; if self.col.max() >= self.shape[1]:&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; raise ValueError('column index exceeds matrix dimensions')&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; if self.row.min() < 0:&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; raise ValueError('negative row index found')&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; if self.col.min() < 0:&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; raise ValueError('negative column index found')所以 row_ind.max(), col.ind.max() 必须小于 M, N以上都是因为您想使用 row_ind 和 col.ind 中的数据作为索引来构造稀疏矩阵。IE:a = np.random.random((8,2))row = np.hstack((a[:,0],a[:,1]))#row[0]=9col = np.hstack([a[:,1],a[:,0]])matrix = csr_matrix(([1]*row.shape[0], (row,col)),shape=(a.shape[0],a.shape[0]))它适用于带有注释的 row[0]=9 。希望能帮助到你。
打开App,查看更多内容
随时随地看视频慕课网APP

相关分类

Go