根据我所读到的内容,我需要创建模型并将其保存为管道才能执行此操作。我一直在尝试根据 SO 上的其他示例来执行此操作,但无法使其工作。如何将现有模型转变为流水线版本?
第一个代码片段保存,第二个代码片段是我将其放入管道的尝试之一,但我收到“str”对象没有属性“items”错误。我认为这与 to_dict 过程有关,但不知道如何在管道版本中复制它,任何人都可以提供帮助。
dframe = pd.read_csv("ner.csv", encoding = "ISO-8859-1", error_bad_lines=False)
dframe.dropna(inplace=True)
dframe[dframe.isnull().any(axis=1)].size
x_df = dframe.drop(['Unnamed: 0', 'sentence_idx', 'tag'], axis=1)
vectorizer = DictVectorizer()
X = vectorizer.fit_transform(x_df.to_dict("records"))
y = dframe.tag.values
x_train, x_test, y_train, y_test = train_test_split(X, y, test_size=0.1, random_state=0)
model = LinearSVC(loss="squared_hinge",C=0.5,class_weight='balanced',multi_class='ovr')
model.fit(x_train, y_train)
dump(model, 'filename.joblib')
dframe = pd.read_csv("ner.csv", encoding = "ISO-8859-1", error_bad_lines=False)
dframe.dropna(inplace=True)
dframe[dframe.isnull().any(axis=1)].size
x_df = dframe.drop(['Unnamed: 0', 'sentence_idx', 'tag'], axis=1)
y = dframe.tag.values
x_train, x_test, y_train, y_test = train_test_split(x_df, y, test_size=0.1, random_state=0)
pipe = Pipeline([('vectorizer', DictVectorizer(x_df.to_dict("records"))), ('model', LinearSVC)])
pipe.fit(x_train, y_train)
慕容708150
相关分类