使用 SciLearn Kit 读取 Pandas 数据框时遇到问题

我是 Python 新手,在使用 Pandas 创建的数据帧上使用 SciLearn Kit 时遇到问题。下面是代码:


import numpy as np

import pandas as pd

import seaborn as sns

import matplotlib as plt

import json

%matplotlib inline


data = pd.read_json('C:/Users/Desktop/Machine Learning/yelp_academic_dataset_business.json', lines=True, orient='columns', encoding='utf-8')

dataframe = pd.DataFrame(data)


list(dataframe)

subset_data = dataframe.loc[(dataframe.city == 'Toronto')]

print(subset_data)

documents = subset_data.to_dict('records')


from sklearn.feature_extraction.text import TfidfVectorizer, CountVectorizer


no_features = 1000


# NMF is able to use tf-idf

tfidf_vectorizer = TfidfVectorizer(max_df=0.95, min_df=2, max_features=no_features, stop_words='english')

tfidf = tfidf_vectorizer.fit_transform(documents)

tfidf_feature_names = tfidf_vectorizer.get_feature_names()


# LDA can only use raw term counts for LDA because it is a probabilistic graphical model

tf_vectorizer = CountVectorizer(max_df=0.95, min_df=2, max_features=no_features, stop_words='english')

tf = tf_vectorizer.fit_transform(documents)

tf_feature_names = tf_vectorizer.get_feature_names()

下面是我得到的错误。


AttributeError: 'dict' object has no attribute 'lower'

数据集可在此处获得:kaggle.com/yelp-dataset/yelp-dataset 数据集:yelp_academic_dataset_business.json


任何帮助将不胜感激。谢谢你。


慕妹3146593
浏览 313回答 1
1回答
打开App,查看更多内容
随时随地看视频慕课网APP

相关分类

Python