我正在尝试创建一个平滑的频率分布图。该代码适用于某个数据集,但为另一个数据集提供以下错误消息:
spl1 = make_interp_spline(bins1, data1['Frequency'].values)
File "/<path_to_anaconda3>/envs/mlpy37/lib/python3.7/site-packages/scipy/interpolate/_bsplines.py", line 805, in make_interp_spline
raise ValueError('x and y are incompatible.')
ValueError: x and y are incompatible.
以下是可以正常工作的数据集的代码:
import math
import numpy as np
import pandas as pd
import statistics
from scipy.stats import skew
from matplotlib import pyplot as plt
from scipy.interpolate import make_interp_spline
raw_data1 = [212, 869, 220, 654, 11, 624, 420, 121, 428, 865, 799, 405, 230, 670, 870, 366, 99, 55, 489, 312, 493, 163, 221, 84, 144, 48, 375, 86, 168, 100]
min_value1 = min(raw_data1)
max_value1 = max(raw_data1)
step1 = math.ceil((max_value1 - min_value1) / 10)
bin_edges1 = [i for i in range(min_value1 - 1, max_value1 + 1, step1)]
bins1 = [i for i in range(min_value1, max_value1 + 1, step1)]
if max(bin_edges1) < max_value1:
bin_edges1.append(max(bin_edges1) + step1)
bins1.append(max(bins1) + step1)
data1 = pd.DataFrame({'Frequency': pd.cut(raw_data1, bin_edges1).value_counts()})
x1 = np.linspace(min(bins1), max(bins1), 250)
spl1 = make_interp_spline(bins1, data1['Frequency'].values)
smooth_curve1 = spl1(x1)
print(data1)
mean1 = statistics.mean(raw_data1)
median1 = statistics.median(raw_data1)
print('Mean: {:.2f}'.format(mean1))
print('Median: {:.2f}'.format(median1))
try:
print('Mode: {:.2f}'.format(statistics.mode(raw_data1)))
except Exception as e:
print(e)
skewness1 = skew(raw_data1)
if mean1 > median1:
print('Positive Skewness: ' + str(skewness1))
elif mean1 < median1:
print('Negative Skewness: ' + str(skewness1))
else:
print('No skewness: ' + str(skewness1))
plt.figure()
plt.subplot(111)
plt.plot(x1, smooth_curve1)
plt.title('Numerical Variables Exercise Skewness')
plt.xlabel('Data')
plt.ylabel('Frequency')
plt.show()
潇湘沐
相关分类