从时间序列中提取模式

首页课程实战体系课手记专栏慕课教程

从时间序列中提取模式

我有以下数据集，一个 Pandas 数据框：

Score min max Date

Loc

0 2.757 0.000 2.757 2020-07-04 11:00:00

3 2.723 2.723 0.000 2020-07-04 14:00:00

8 2.724 2.724 0.000 2020-07-04 19:00:00

11 2.752 0.000 2.752 2020-07-04 22:00:00

13 2.742 2.742 0.000 2020-07-05 00:00:00

15 2.781 0.000 2.781 2020-07-05 02:00:00

18 2.758 2.758 0.000 2020-07-05 05:00:00

20 2.865 0.000 2.865 2020-07-05 07:00:00

24 2.832 0.000 2.832 2020-07-05 11:00:00

25 2.779 2.779 0.000 2020-07-05 12:00:00

29 2.775 2.775 0.000 2020-07-05 16:00:00

34 2.954 0.000 2.954 2020-07-05 21:00:00

37 2.886 2.886 0.000 2020-07-06 00:00:00

48 3.101 0.000 3.101 2020-07-06 11:00:00

53 3.012 3.012 0.000 2020-07-06 16:00:00

55 3.068 0.000 3.068 2020-07-06 18:00:00

61 2.970 2.970 0.000 2020-07-07 00:00:00

64 3.058 0.000 3.058 2020-07-07 03:00:00

在哪里：

Score是一个非常基本的趋势，min并且max是的局部最小值和最大值Score。

Loc是该行 x 轴上的值，并且date是图表上该行的值。

我正在尝试从我的代码中检测红色框中的数据，以便我可以在其他数据集上检测到它。基本上我正在寻找的是一种从我的代码中设置该数据的定义的方法，以便可以从其他数据中检测到它。

到目前为止，我只设法在图表上标记局部最大值和最小值（黄色和红色点），我也知道如何用我自己的话来定义该模式，我只需要从代码中做到这一点：

定义最小值/最大值点何时距离先前的最小值/最大值点很远（因此它具有更高的值）
之后，找到局部最小值和最大值的点何时真正彼此接近并且它们的值彼此之间的差异不是很大。简而言之，当一个强劲的增长之后是一个分数不会上升或下降太多的范围时

我希望这个问题足够清楚，如果需要我可以提供更多细节。我不知道 Numpy 或任何其他库是否可行。

慕桂英4014372

浏览 108回答 1

1回答

长风秋雁

我认为动态时间扭曲 (dtw) 可能适合您。我已经将它用于类似的事情。本质上，它允许您评估时间序列的相似性。以下是我所知道的 python 实现：快速dtwdtwdtw-python这是它如何工作的一个体面的解释DTW的Towards Data Science解释您可以使用它来比较传入时间序列与红框中数据的相似程度。例如：# Event were looking forevent = np.array([10, 100, 50, 60, 50, 70])# A matching event occurringevent2 = np.array([0, 7, 12, 4, 11, 100, 51, 62, 53, 72])# A non matching eventnon_event = np.array([0, 5, 10, 5, 10, 20, 30, 20, 11, 9])distance, path = fastdtw(event, event2)distance2, path2 = fastdtw(event, non_event)这将产生一组指数，其中两个时间序列最匹配。在这一点上，您可以通过您喜欢的任何方法进行评估。我粗略地查看了值的相关性def event_corr(event,event2, path):    d = []    for p in path:        d.append((event2[p[1]] * event[p[0]])/event[p[0]]**2)    return np.mean(d)print("Our event re-occuring is {:0.2f} correlated with our search event.".format(event_corr(event, event2, path)))print("Our non-event is {:0.2f} correlated with our search event.".format(event_corr(event, non_event, path2)))产生：Our event re-occurring is 0.85 correlated with our search event.Our non-event is 0.45 correlated with our search event.

0 0

随时随地看视频慕课网APP

相关分类

Python