现在的情况:
我有一个函数将二进制类目标变量分为“1”和“0”,然后读取每个变量的所有自变量。该函数还根据类别“1”和“0”确定每个自变量的 KDE,然后计算相交面积:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from scipy.stats import gaussian_kde
def intersection_area(data, bandwidth, margin,target_variable_name):
#target_variable_name is the column name of the response variable
data = data.dropna()
X = data.drop(columns = [str(target_variable_name)], axis = 1)
names = list(X.columns)
new_columns = []
for column_name in names[:-1]:
x0= data.loc[data[str(target_variable_name)] == 0,str(column_name)]
x1= data.loc[data[str(target_variable_name)] == 1,str(column_name)]
kde0 = gaussian_kde(x0, bw_method=bandwidth)
kde1 = gaussian_kde(x1, bw_method=bandwidth)
x_min = min(x0.min(), x1.min()) #find the lowest value between two minimum points
x_max = min(x0.max(), x1.max()) #finds the lowest value between two maximum points
dx = margin * (x_max - x_min) # add a margin since the kde is wider than the data
x_min -= dx
x_max += dx
x = np.linspace(x_min, x_max, 500)
kde0_x = kde0(x)
kde1_x = kde1(x)
inters_x = np.minimum(kde0_x, kde1_x)
area_inters_x = np.trapz(inters_x, x) #intersection of two kde
print(area_inters_x)
问题: 如果我有 n_class = 4 该函数将如下所示:
撒科打诨
相关分类