我正在运行一个非常简单的实验,ColumnTransformer目的是转换一个列数组,在本例中为 ["a"]:
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.compose import ColumnTransformer
dataset = pd.DataFrame({"a":["word gone wild","gone with wind"],"c":[1,2]})
tfidf = TfidfVectorizer(min_df=0)
clmn = ColumnTransformer([("tfidf", tfidf, ["a"])],remainder="passthrough")
clmn.fit_transform(dataset)
这给了我:
ValueError: empty vocabulary; perhaps the documents only contain stop words
显然,TfidfVectorizer可以fit_transform()自己做:
tfidf.fit_transform(dataset.a)
<2x5 sparse matrix of type '<class 'numpy.float64'>'
with 6 stored elements in Compressed Sparse Row format>
出现这种错误的原因可能是什么以及如何纠正它?
小怪兽爱吃肉
德玛西亚99
相关分类