我想在 python 中为我即将推出的模型使用 xgboost。然而,由于我们的生产系统在 SAS 中,我试图从 xgboost 中提取决策规则,然后编写 SAS 评分代码以在 SAS 环境中实现该模型。
上面两个链接对xgboost部署特别是Shiutang-Li给出的代码有很大帮助。但是,我的预测分数并不完全匹配。
以下是我迄今为止尝试过的代码:
import numpy as np
import pandas as pd
import xgboost as xgb
from sklearn.grid_search import GridSearchCV
%matplotlib inline
import graphviz
from graphviz import Digraph
#Read the sample iris data:
iris =pd.read_csv("C:\\Users\\XXXX\\Downloads\\Iris.csv")
#Create dependent variable:
iris.loc[iris["class"] != 2,"class"] = 0
iris.loc[iris["class"] == 2,"class"] = 1
#Select independent and dependent variable:
X = iris[["sepal_length","sepal_width","petal_length","petal_width"]]
Y = iris["class"]
xgdmat = xgb.DMatrix(X, Y) # Create our DMatrix to make XGBoost more efficient
#Build the sample xgboost Model:
our_params = {'eta': 0.1, 'seed':0, 'subsample': 0.8, 'colsample_bytree': 0.8,
'objective': 'binary:logistic', 'max_depth':3, 'min_child_weight':1}
Base_Model = xgb.train(our_params, xgdmat, num_boost_round = 10)
#Below code reads the dump file created by xgboost and writes a scoring code in SAS:
import re
def string_parser(s):
if len(re.findall(r":leaf=", s)) == 0:
out = re.findall(r"[\w.-]+", s)
tabs = re.findall(r"[\t]+", s)
if (out[4] == out[8]):
missing_value_handling = (" or missing(" + out[1] + ")")
else:
missing_value_handling = ""
if len(tabs) > 0:
return (re.findall(r"[\t]+", s)[0].replace('\t', ' ') +
' if state = ' + out[0] + ' then do;\n' +
re.findall(r"[\t]+", s)[0].replace('\t', ' ') +
' if ' + out[1] + ' < ' + out[2] + missing_value_handling +
所以基本上,我想要做的是,将节点号保存在变量“状态”中,并相应地访问叶节点(我从上面链接中提到的 Shiutang-Li 的文章中了解到)。
蓝山帝景
Smart猫小萌
神不在的星期二
相关分类