猿问

如何在另一个 pandas TimeIndex 中将系列索引移动 1 行?

我有一个名为“raw_Ix”的 pd.DatetimeIndex,它包含我正在使用的所有索引和两个 pandas (Time)series("t1" and "nextloc_ixS")(两者都具有相同的时间索引)。“nextloc_ixS”中的值与 t1.index 和 nextloc_ixS.index 中的相同索引在 raw_Ix 中移动了一位。为了更好地理解“nextloc_ixS”是什么:

   nextloc_ixS =  t1.index[np.searchsorted(raw_Ix, t1.index)+1]
   nextloc_ixS = pd.DataFrame(nextloc_ixS, index = t1.index)

这三个都传递给一个函数,我需要它们的形式如下:

  1. 我需要删除 t1.index 不在 raw_Ix 中的 t1 行(以避免错误,因为 raw_Ix 可能已被操纵)

  2. 之后,我现在深入复制 t1(我们称之为 t1_copy)。因为我需要nextloc_ixS 的作为 t1_copy 的新 DatetimeIndex。(听起来很简单,但在这里我遇到了困难)

  3. 但在替换 i 的索引之前,可能需要将 t1_copy 的旧索引保存为 t1_copy 中的一列,用于最后一步(== 第 5 步)。

  4. 实际函数在特定过程中选择 t1_copy 的一些索引并返回“结果”,这是一个 pd.DatetimeIndex,其中包含 t1_copy 的一些索引和重复项

  5. 我需要将结果移回 1,但不是通过 np.searchsorted。(注意:“结果”仍然人为地向前移动,所以我们可以通过在 t1_copy.index 中获取索引位置然后在步骤 3 的备份列中获取“旧”索引来将其设置为向后。

我知道这听起来有点复杂,因此这是我处理的低效代码:

def main(raw_Ix, t1, nextloc, nextloc_ixS=None):   


    t1_copy = t1.copy(deep=True).to_frame()

    nextloc_ixS = nextloc_ixS.to_frame() 

    

    if nextloc_ixS is not None: 

         

        t1_copy                  = t1_copy.loc[t1_copy.index.intersection(pd.DatetimeIndex(raw_Ix))] 

        t1_copy                  = t1_copy[~t1_copy.index.duplicated(keep='first')]# somehow duplicates came up, I couldnt explain why

        t1_copy["index_old"] = t1_copy.index.copy(deep=True) 

        temp                     = nextloc_ixS.loc[nextloc_ixS.index.intersection(raw_Ix)].copy(deep=True) 

        t1_copy.set_index(pd.DatetimeIndex(temp[~temp.index.duplicated(keep='first')].values), inplace=True) # somehow duplicates came up, I couldnt explain why therefore the .duplicated(...)



else: # in this case we just should find the intersection

        t1_copy = t1_copy.loc[t1.index.intersection(pd.DatetimeIndex(raw_Ix))]

        t1_copy = t1_copy[~t1_copy.index.duplicated(keep='first')]  




所以简而言之:我尝试前后移动索引,同时避免使用 np.searchsorted() 而是使用两个 pd.Series (或者更好地将其称为列,因为它们是从 DataFrame 中单独传递的)


有什么方法可以在代码线和时间使用方面有效地做到这一点?(行数非常多)


陪伴而非守候
浏览 108回答 1
1回答

慕村225694

您的逻辑很复杂,可以实现两件事删除不在列表中的行。我为此使用了一个技巧,所以我可以使用dropna()到shift()专栏这表现得很好。数据集 > 0.5m 行上的几分之一秒。import timed = [d for d in pd.date_range(dt.datetime(2015,5,1,2),&nbsp;&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; dt.datetime(2020,5,1,4), freq="128s")&nbsp;&nbsp; &nbsp; &nbsp;if random.randint(0,3) < 2 ] # miss some sample times...&nbsp;# random manipulation of rawIdx so there are some rows where ts is not in rawIdxdf = pd.DataFrame({"ts":d, "rawIdx":[x if random.randint(0,3)<=2&nbsp;&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;else x + pd.Timedelta(1, unit="s") for x in d],&nbsp;&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;"val":[random.randint(0,50) for x in d]}).set_index("ts")start = time.time()print(f"size before: {len(df)}")dfc = df.assign(&nbsp; &nbsp; # make it float64 so can have nan, map False to nan so can dropna() rows that are not in rawIdx&nbsp; &nbsp; issue=lambda dfa: np.array(np.where(dfa.index.isin(dfa["rawIdx"]),True, np.nan), dtype="float64"),).dropna().drop(columns="issue").assign(&nbsp; &nbsp; # this should be just a straight forward shift.&nbsp; rawIdx will be same as index due to dropna()&nbsp; &nbsp; nextloc_ixS=df.rawIdx.shift(-1))print(f"size after: {len(dfc)}\ntime: {time.time()-start:.2f}s\n\n{dfc.head().to_string()}")输出size before: 616264size after: 462207time: 0.13s&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;rawIdx&nbsp; val&nbsp; &nbsp; &nbsp; &nbsp; &nbsp;nextloc_ixSts&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;&nbsp;2015-05-01 02:02:08 2015-05-01 02:02:08&nbsp; &nbsp;33 2015-05-01 02:06:242015-05-01 02:06:24 2015-05-01 02:06:24&nbsp; &nbsp;40 2015-05-01 02:08:332015-05-01 02:10:40 2015-05-01 02:10:40&nbsp; &nbsp;15 2015-05-01 02:12:482015-05-01 02:12:48 2015-05-01 02:12:48&nbsp; &nbsp;45 2015-05-01 02:17:042015-05-01 02:17:04 2015-05-01 02:17:04&nbsp; &nbsp;14 2015-05-01 02:21:21
随时随地看视频慕课网APP

相关分类

Python
我要回答