scipy.interpolate.make_interp_spline 给出

我正在尝试创建一个平滑的频率分布图。该代码适用于某个数据集,但为另一个数据集提供以下错误消息:


spl1 = make_interp_spline(bins1, data1['Frequency'].values)


File "/<path_to_anaconda3>/envs/mlpy37/lib/python3.7/site-packages/scipy/interpolate/_bsplines.py", line 805, in make_interp_spline

    raise ValueError('x and y are incompatible.')

ValueError: x and y are incompatible.

以下是可以正常工作的数据集的代码:


import math

import numpy as np

import pandas as pd

import statistics

from scipy.stats import skew

from matplotlib import pyplot as plt

from scipy.interpolate import make_interp_spline


raw_data1 = [212, 869, 220, 654, 11, 624, 420, 121, 428, 865, 799, 405, 230, 670, 870, 366, 99, 55, 489, 312, 493, 163, 221, 84, 144, 48, 375, 86, 168, 100]

min_value1 = min(raw_data1)

max_value1 = max(raw_data1)

step1 = math.ceil((max_value1 - min_value1) / 10)

bin_edges1 = [i for i in range(min_value1 - 1, max_value1 + 1, step1)]

bins1 = [i for i in range(min_value1, max_value1 + 1, step1)]

if max(bin_edges1) < max_value1:

    bin_edges1.append(max(bin_edges1) + step1)

    bins1.append(max(bins1) + step1)

data1 = pd.DataFrame({'Frequency': pd.cut(raw_data1, bin_edges1).value_counts()})

x1 = np.linspace(min(bins1), max(bins1), 250)

spl1 = make_interp_spline(bins1, data1['Frequency'].values)

smooth_curve1 = spl1(x1)


print(data1)

mean1 = statistics.mean(raw_data1)

median1 = statistics.median(raw_data1)

print('Mean: {:.2f}'.format(mean1))

print('Median: {:.2f}'.format(median1))

try:

    print('Mode: {:.2f}'.format(statistics.mode(raw_data1)))

except Exception as e:

    print(e)

skewness1 = skew(raw_data1)

if mean1 > median1:

    print('Positive Skewness: ' + str(skewness1))

elif mean1 < median1:

    print('Negative Skewness: ' + str(skewness1))

else:

    print('No skewness: ' + str(skewness1))


plt.figure()


plt.subplot(111)

plt.plot(x1, smooth_curve1)

plt.title('Numerical Variables Exercise Skewness')

plt.xlabel('Data')

plt.ylabel('Frequency')


plt.show()





MYYA
浏览 64回答 1
1回答

潇湘沐

注释掉一行实际上解决了问题(或者至少它运行了,我无法验证输出)。错误消息很有用:x 和 y 应该是相同的长度。if max(bin_edges1) < max_value1:&nbsp; &nbsp; bin_edges1.append(max(bin_edges1) + step1)&nbsp; &nbsp; # bins1.append(max(bins1) + step1) <-- this one此外,您的代码很难遵循,因为您混淆了您的工具。您也将其定义raw_data1为 python 列表bins1,并使用列表理解。raw_data1 = [212, 869, 220, 654, 11, 624, 420, 121, 428, 865, 799, 405, 230, 670, 870, 366, 99, 55, 489, 312, 493, 163, 221, 84, 144, 48, 375, 86, 168, 100]..bins1 = [i for i in range(min_value1, max_value1 + 1, step1)]然后你使用 numpy.linspace 为x1.x1 = np.linspace(min(bins1), max(bins1), 250)还涉及熊猫:data1 = pd.DataFrame({'Frequency': pd.cut(raw_data1, bin_edges1).value_counts()})我建议主要使用一个,仅在必要时使用其他工具。
打开App,查看更多内容
随时随地看视频慕课网APP

相关分类

Python