我正在建立一个Logistic回归模型,以仅包含150个观察值的数据集来预测交易是否有效(1)(0)。我的数据在两个类之间的分配情况如下:
106个观察值为0(无效)
44个观察值为1(有效)
我正在使用两个预测变量(都是数值)。尽管数据大多为0,但我的分类器只为我的测试集中的每笔交易预测1,即使大多数交易应为0。分类器从不为任何观察输出0。
这是我的整个代码:
# Logistic Regression
import numpy as np
import pandas as pd
from pandas import Series, DataFrame
import scipy
from scipy.stats import spearmanr
from pylab import rcParams
import seaborn as sb
import matplotlib.pyplot as plt
import sklearn
from sklearn.preprocessing import scale
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split
from sklearn import metrics
from sklearn import preprocessing
address = "dummy_csv-150.csv"
trades = pd.read_csv(address)
trades.columns=['location','app','el','rp','rule1','rule2','rule3','validity','transactions']
trades.head()
trade_data = trades.ix[:,(1,8)].values
trade_data_names = ['app','transactions']
# set dependent/response variable
y = trades.ix[:,7].values
# center around the data mean
X= scale(trade_data)
LogReg = LogisticRegression()
LogReg.fit(X,y)
print(LogReg.score(X,y))
y_pred = LogReg.predict(X)
from sklearn.metrics import classification_report
print(classification_report(y,y_pred))
log_prediction = LogReg.predict_log_proba(
[
[2, 14],[3,1], [1, 503],[1, 122],[1, 101],[1, 610],[1, 2120],[3, 85],[3, 91],[2, 167],[2, 553],[2, 144]
])
prediction = LogReg.predict([[2, 14],[3,1], [1, 503],[1, 122],[1, 101],[1, 610],[1, 2120],[3, 85],[3, 91],[2, 167],[2, 553],[2, 144]])
UYOU
相关分类