求曲线下最大面积 | 熊猫、matplotlib

您可以找到 0 线下方的最大面积。我生成了自己的数据x = np.random.randn(100000) x = x.cumsum()-x.mean() plt.plot(x);现在计算正序列和负序列的起点和终点。序列中的每个值都会获得一个递增的整数，以便能够按序列进行分组。x1 = np.diff(x < 0).cumsum()使用 pandas groupby 计算所有区域并找到最大的负数df = pd.DataFrame({    'value': x[1:],    'border': x1})dfg = df.groupby('border')mingr = dfg.apply(lambda x: np.trapz(x.value)).idxmin()plt.plot(x[1:])plt.plot(    dfg.get_group(mingr).value);plt.title(    "position from {} to {}".format(        dfg.get_group(mingr).index[0],        dfg.get_group(mingr).index[-1]));这是如何运作的我创建了一个更容易遵循的数据集x = np.array([3,4,4.5,3,2])X = np.r_[x,-x,x,-x]+np.random.normal(0,.2,20)plt.figure(figsize=(12,5))plt.axhline(0, color='gray')plt.plot(X, 'o--');我想知道具有连续负值或正值的序列。这可以使用过滤器 X < 0 进行存档。df = pd.DataFrame({'value': X, 'lt_zero': X < 0})df[:10]      value  lt_zero0  3.125986    False1  3.885588    False2  4.580410    False3  2.998920    False4  1.913088    False5 -2.902447     True6 -3.986654     True7 -4.373026     True8 -2.878661     True9 -1.929964     True现在，当我比较每个连续值时，我可以找到符号发生变化的索引。我在数据之前连接一个 False，以免丢失第一个值。df['sign_switch'] = np.diff(np.r_[False, X < 0])df[:10]      value  lt_zero  sign_switch0  3.125986    False        False1  3.885588    False        False2  4.580410    False        False3  2.998920    False        False4  1.913088    False        False5 -2.902447     True         True6 -3.986654     True        False7 -4.373026     True        False8 -2.878661     True        False9 -1.929964     True        False我为cumsum()每个序列得到一个递增的整数值。现在我为每个序列都有一个分组变量。df['sign_sequence'] = np.diff(np.r_[False, X < 0]).cumsum()df[:10]      value  lt_zero  sign_switch  sign_sequence0  3.125986    False        False              01  3.885588    False        False              02  4.580410    False        False              03  2.998920    False        False              04  1.913088    False        False              05 -2.902447     True         True              16 -3.986654     True        False              17 -4.373026     True        False              18 -2.878661     True        False              19 -1.929964     True        False              1对于每个组，我可以计算组中值的积分。sign_groups = df.groupby('sign_sequence')sign_groups.apply(lambda x: np.trapz(x.value))sign_sequence0    13.9844551   -13.6545472    14.3700443   -14.549090您可以稍后访问每个组并使用这些区域。例如绘制区域。plt.figure(figsize=(12,5))plt.plot(X,'o--')plt.axhline(0, c='gray')for e,group in enumerate(sign_groups):    plt.fill_between(group[1].index,0, group[1].value)    area = np.trapz(group[1].value)    plt.text((e)*5+1.5, np.sign(area) * 1.25, f'{area:.2f}', fontsize=12)

求曲线下最大面积 | 熊猫、matplotlib

1回答