将日期向量转换为范围的 Pythonic 方法？

首页课程实战体系课手记专栏慕课教程

将日期向量转换为范围的 Pythonic 方法？

我有一个每天一行的 Pandas DataFrame 和一些布尔列。我想将它们转换成一个 DataFrame 来保存这些列为True的范围。

启动 DF 的示例：

import pandas as pd

t = True

f = False

df = pd.DataFrame(

{'indic': [f, f, t, t, t, f, f, f, t, f, f, t, t, t, t]},

index=pd.date_range("2018-01-01", "2018-01-15")

)

print(df)

indic

2018-01-01 False

2018-01-02 False

2018-01-03 True

2018-01-04 True

2018-01-05 True

2018-01-06 False

2018-01-07 False

2018-01-08 False

2018-01-09 True

2018-01-10 False

2018-01-11 False

2018-01-12 True

2018-01-13 True

2018-01-14 True

2018-01-15 True

这个 DataFrame 的列从 2018-01-03 到 2018-01-05 是 True，然后是 2018-01-09（只有一天），然后是从 2018-01-12 到 2018-01-15。

我在这个例子中寻找的输出是这个 DF（日期对象而不是字符串也可以，甚至是首选）：

desired_result = pd.DataFrame({

'from': ["2018-01-03", "2018-01-09", "2018-01-12"],

'to': ["2018-01-05", "2018-01-09", "2018-01-15"]

})

print(desired_result)

from to

0 2018-01-03 2018-01-05

1 2018-01-09 2018-01-09

2 2018-01-12 2018-01-15

作为扩展，在后续步骤中，我希望它适用于多列，例如：

df = pd.DataFrame(

{

'indic_A': [f, f, t, t, t, f, f, f, t, f, f, t, t, t, t],

'indic_B': [f, f, f, f, f, f, f, f, t, t, t, t, t, f, f]

},

index=pd.date_range("2018-01-01", "2018-01-15")

)

desired_result = pd.DataFrame({

'from': ["2018-01-03", "2018-01-09", "2018-01-12", "2018-01-09"],

'to': ["2018-01-05", "2018-01-09", "2018-01-15", "2018-01-13"],

'what': ["indic_A", "indic_A", "indic_A", "indic_B"]

})

print(desired_result)

from to what

0 2018-01-03 2018-01-05 indic_A

1 2018-01-09 2018-01-09 indic_A

2 2018-01-12 2018-01-15 indic_A

3 2018-01-09 2018-01-13 indic_B

有没有一种pythonic的、优雅的方式来做到这一点——甚至可能是一个pandas函数？

明月笑刀无情

浏览 154回答 2

2回答

慕码人2483693

使用melt了重塑第一，然后创建帮手唯一的组列通过cumsum，过滤器只有True人民共同boolean indexing和聚合agg的功能first和last：df = df.rename_axis('date').reset_index().melt('date', var_name='ind', value_name='boolean')df['new'] = (~df['boolean']).cumsum()df = (df[df['boolean']]         .groupby('new')         .agg({'date':['first','last'], 'ind':'first'})         .reset_index(drop=True))df.columns = df.columns.map('_'.join)print (df)  date_first  date_last ind_first0 2018-01-03 2018-01-05   indic_A1 2018-01-09 2018-01-09   indic_A2 2018-01-12 2018-01-15   indic_A3 2018-01-09 2018-01-13   indic_B

0 0

BIG阳

你可以试试 pd.DataFrame.shift首先制作2个新的上下移位列df['down_shift'] = df['indic'].shift()df['up_shift'] = df['indic'].shift(-1)并且df会像            indic down_shift up_shift2018-01-01  False        NaN    False2018-01-02  False      False     True2018-01-03   True      False     True2018-01-04   True       True     True2018-01-05   True       True    False2018-01-06  False       True    False2018-01-07  False      False    False2018-01-08  False      False     True2018-01-09   True      False    False2018-01-10  False       True    False2018-01-11  False      False     True2018-01-12   True      False     True2018-01-13   True       True     True2018-01-14   True       True     True2018-01-15   True       True      NaN这里的想法是情况 1: (indic, down_shift) = (True, False) - 开始情况 2: (indic, up_shift) = (True, False) - 结束情况 3：情况 1 和情况 2 都发生 - 开始和结束所以我们使用技巧真 - 假 = 1假 - 真 = -1真 - 真 = 0假 - 假 = 0代码：case_start = df['indic'] - df['down_shift']case_end = df['indic'] - df['up_shift']start_date_list = df[case_start == 1].indexend_date_list = df[case_end == 1].index然后我们检查 start_date_listDatetimeIndex(['2018-01-03', '2018-01-09', '2018-01-12'], dtype='datetime64[ns]', freq=None)然后我们检查 end_date_listDatetimeIndex(['2018-01-05', '2018-01-09'], dtype='datetime64[ns]', freq='4D')最后一个日期不会从 True 变为 False，因此我们需要手动添加它。

0 0

随时随地看视频慕课网APP

相关分类

Python