猿问

将每个数据帧行切片为 3 个具有不同切片范围的窗口

我想将我的数据帧的每一行切片成 3 个窗口,切片索引存储在另一个数据帧中,并针对数据帧的每一行进行更改。之后我想以 MultiIndex 的形式返回一个包含窗口的数据帧。每个窗口中比窗口中最长的行短的行应该用 NaN 值填充。由于我的实际数据框有大约 100.000 行和 600 列,我很关心一个有效的解决方案。


考虑以下示例:


这是我的数据框,我想将其分成 3 个窗口


>>> df

  0  1  2  3  4  5  6  7

0 0  1  2  3  4  5  6  7

1 8  9  10 11 12 13 14 15

2 16 17 18 19 20 21 22 23

第二个数据框包含我的切片索引,其行数与df:


>>> df_slice

  0 1

0 3 5

1 2 6

2 4 7

我试过切片窗户,像这样:


first_window = df.iloc[:, :df_slice.iloc[:, 0]]

first_window.columns = pd.MultiIndex.from_tuples([("A", c) for c in first_window.columns])


second_window = df.iloc[:, df_slice.iloc[:, 0] : df_slice.iloc[:, 1]]

second_window.columns = pd.MultiIndex.from_tuples([("B", c) for c in second_window.columns])


third_window = df.iloc[:, df_slice.iloc[:, 1]:]

third_window.columns = pd.MultiIndex.from_tuples([("C", c) for c in third_window.columns])

result = pd.concat([first_window,

                    second_window,

                    third_window], axis=1)

这给了我以下错误:


TypeError: cannot do slice indexing on <class 'pandas.core.indexes.range.RangeIndex'> with these indexers [0    3

1    2

2    4

Name: 0, dtype: int64] of <class 'pandas.core.series.Series'>

我的预期输出是这样的:


>>> result

    A                   B                   C           

    0   1     2     3   4   5     6     7   8     9    10

0   0   1     2   NaN   3   4   NaN   NaN   5     6    7

1   8   9   NaN   NaN  10  11    12    13  14    15  NaN

2  16  17    18    19  20  21    22   NaN  23   NaN  NaN

在不遍历数据帧的每一行的情况下,是否有一个有效的解决方案来解决我的问题?


慕码人2483693
浏览 61回答 1
1回答

隔江千里

这是一个解决方案,使用meltand thenpivot_table加上一些逻辑来:确定三组“A”、“B”和“C”。将列向左移动,以便 NaN 仅出现在每个窗口的右侧。重命名列以获得预期的输出。&nbsp; &nbsp; t = df.reset_index().melt(id_vars="index")&nbsp; &nbsp; t = pd.merge(t, df_slice, left_on="index", right_index=True)&nbsp; &nbsp; t.variable = pd.to_numeric(t.variable)&nbsp; &nbsp;&nbsp;&nbsp; &nbsp; t.loc[t.variable < t.c_0,"group"] = "A"&nbsp; &nbsp; t.loc[(t.variable >= t.c_0) & (t.variable < t.c_1), "group"] = "B"&nbsp; &nbsp; t.loc[t.variable >= t.c_1, "group"] = "C"&nbsp; &nbsp; # shift relevant values to the left&nbsp; &nbsp; shift_val = t.groupby(["group", "index"]).variable.transform("min") - t.groupby(["group"]).variable.transform("min")&nbsp; &nbsp; t.variable = t.variable - shift_val&nbsp; &nbsp;&nbsp;&nbsp; &nbsp; # extract a, b, and c groups, and create a multi-level index for their&nbsp; &nbsp; # columns&nbsp; &nbsp; df_a = pd.pivot_table(t[t.group == "A"], index= "index", columns="variable", values="value")&nbsp; &nbsp; df_a.columns = pd.MultiIndex.from_product([["a"], df_a.columns])&nbsp; &nbsp;&nbsp;&nbsp; &nbsp; df_b = pd.pivot_table(t[t.group == "B"], index= "index", columns="variable", values="value")&nbsp; &nbsp; df_b.columns = pd.MultiIndex.from_product([["b"], df_b.columns])&nbsp; &nbsp;&nbsp;&nbsp; &nbsp; df_c = pd.pivot_table(t[t.group == "C"], index= "index", columns="variable", values="value")&nbsp; &nbsp; df_c.columns = pd.MultiIndex.from_product([["c"], df_c.columns])&nbsp; &nbsp;&nbsp;&nbsp; &nbsp; res = pd.concat([df_a, df_b, df_c], axis=1)&nbsp; &nbsp;&nbsp;&nbsp; &nbsp; res.columns = pd.MultiIndex.from_tuples([(c[0], i) for i, c in enumerate(res.columns)])&nbsp; &nbsp;&nbsp;&nbsp; &nbsp; print(res)输出是:&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; a&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;b&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;c&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;&nbsp; &nbsp; &nbsp; &nbsp; &nbsp;0&nbsp; &nbsp; &nbsp;1&nbsp; &nbsp; &nbsp;2&nbsp; &nbsp; &nbsp;3&nbsp; &nbsp; &nbsp;4&nbsp; &nbsp; &nbsp;5&nbsp; &nbsp; &nbsp;6&nbsp; &nbsp; &nbsp;7&nbsp; &nbsp; &nbsp;8&nbsp; &nbsp; &nbsp;9&nbsp; &nbsp; 10index&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;0&nbsp; &nbsp; &nbsp; &nbsp;0.0&nbsp; &nbsp;1.0&nbsp; &nbsp;2.0&nbsp; &nbsp;NaN&nbsp; &nbsp;3.0&nbsp; &nbsp;4.0&nbsp; &nbsp;NaN&nbsp; &nbsp;NaN&nbsp; &nbsp;5.0&nbsp; &nbsp;6.0&nbsp; 7.01&nbsp; &nbsp; &nbsp; &nbsp;8.0&nbsp; &nbsp;9.0&nbsp; &nbsp;NaN&nbsp; &nbsp;NaN&nbsp; 10.0&nbsp; 11.0&nbsp; 12.0&nbsp; 13.0&nbsp; 14.0&nbsp; 15.0&nbsp; NaN2&nbsp; &nbsp; &nbsp; 16.0&nbsp; 17.0&nbsp; 18.0&nbsp; 19.0&nbsp; 20.0&nbsp; 21.0&nbsp; 22.0&nbsp; &nbsp;NaN&nbsp; 23.0&nbsp; &nbsp;NaN&nbsp; NaN
随时随地看视频慕课网APP

相关分类

Python
我要回答