猿问

如何使用 Pandas 合并具有重复时间戳的元素

我有以下代表加密交易的数据:


1599177600000,381.52,1.425,s

1599177600000,381.49,0.828,s

1599177600000,381.48,0.747,s

1599177600212,381.53,3.225,s

1599177600560,381.53,0.226,s

1599177600560,381.45,0.637,s

1599177600560,381.44,11.431,s

1599177600560,381.38,2.153,s

1599177600560,381.37,0.569,s

1599177600560,381.35,150,s

1599177600560,381.33,1.056,s

1599177600560,381.32,8.581,s

1599177600560,381.31,16.947,s

1599177600560,381.29,15.877,s

1599177600590,381.45,2.586,s

1599177600652,381.54,0.03,b

1599177600826,381.39,0.5,s

1599177601166,381.39,0.139,s

1599177601304,381.39,1.445,s

1599177601306,381.35,2.555,s

1599177601624,381.3,1.552,s

1599177601706,381.29,2,s

1599177601868,381.31,0.262,s

1599177602108,381.29,0.092,s

1599177602242,381.3,0.05,b

1599177602296,381.31,2.228,b

1599177602312,381.32,0.05,b

1599177602386,381.33,0.639,b

1599177602388,381.29,7.901,s

1599177602388,381.25,12.099,s

这些列是:unix 时间戳(毫秒)、价格、数量和代表交易是买入还是卖出事件的字母(b 或 s)。


使用 Pandas,如何将具有相同时间戳的行合并在一起,同时添加额外的列?


合并规则为:


new quantity = sum quantity for all rows

new price = sum (quantity * price) for all rows / new quantity

例外的是:


if there is a duplicate timestamp with different letters, the one with the letter 'b' has to be pushed ahead by 1ms

额外的列是:


if a row is a result of a merge, the extra columns needs to have a bool True in it

然后使用该时间戳作为索引?


我不确定这是否可以一次性完成,但我也不太熟悉 Pandas 的语法来弄清楚如何做到这一点,所以任何带有解释的答案都会很棒。


慕田峪4524236
浏览 100回答 2
2回答

杨魅力

这是一种方法:w_avg = df.groupby("time").apply(lambda d: sum(d["price"]*d["volume"])/d["volume"].sum())s = df.loc[df["time"].duplicated(keep=False),"time"].unique()df = df.groupby("time", as_index=False).agg({"volume": "sum"})print (df.assign(w_avg=df["time"].map(w_avg), boolean=df["time"].isin(s)))             time   volume       w_avg  boolean0   1599177600000    3.000  381.501760     True1   1599177600212    3.225  381.530000    False2   1599177600560  207.477  381.346627     True3   1599177600590    2.586  381.450000    False4   1599177600652    0.030  381.540000    False5   1599177600826    0.500  381.390000    False6   1599177601166    0.139  381.390000    False7   1599177601304    1.445  381.390000    False8   1599177601306    2.555  381.350000    False9   1599177601624    1.552  381.300000    False10  1599177601706    2.000  381.290000    False11  1599177601868    0.262  381.310000    False12  1599177602108    0.092  381.290000    False13  1599177602242    0.050  381.300000    False14  1599177602296    2.228  381.310000    False15  1599177602312    0.050  381.320000    False16  1599177602386    0.639  381.330000    False17  1599177602388   20.000  381.265802     True

30秒到达战场

IIUC,这是一种包含业务逻辑的方法(使用加权平均值;如果同时有买入和卖出,则向前移动一毫秒的时间戳):# create data framedf = pd.read_csv(StringIO(data), sep=',')#df['timestamp'] -= df['timestamp'].min()# find buy, sell timestampsbuy_timestamps = df.loc[ df['buy_sell'] == 'b', 'timestamp']sell_timestamps = df.loc[ df['buy_sell'] == 's', 'timestamp']bs_timestamps = set(buy_timestamps) & set(sell_timestamps)# adjust timestampsdf.loc[ df['timestamp'].isin(bs_timestamps), 'timestamp' ] += 1# helper columnsdf['price_quantity'] = df['price'] * df['quantity']df['multiple_trades'] = df.groupby('timestamp')['buy_sell'].transform('count')现在执行聚合计算:g = df.groupby(['timestamp', 'buy_sell'])t = (pd.concat([    (g['price_quantity'].sum() / g['quantity'].sum()).rename('price'),    g['quantity'].sum().rename('quantity'),    g['timestamp'].count().apply(lambda x: True if x > 1 else False).rename('trade_count')],     axis=1)     .reset_index()     .filter(['timestamp', 'price', 'buy_sell', 'trade_count'])    )print(t)结果是:        timestamp       price buy_sell  trade_count0   1599177600000  381.501760        s         True1   1599177600212  381.530000        s        False2   1599177600560  381.346627        s         True3   1599177600590  381.450000        s        False4   1599177600652  381.540000        b        False5   1599177600826  381.390000        s        False6   1599177601166  381.390000        s        False7   1599177601304  381.390000        s        False8   1599177601306  381.350000        s        False9   1599177601624  381.300000        s        False10  1599177601706  381.290000        s        False11  1599177601868  381.310000        s        False12  1599177602108  381.290000        s        False13  1599177602242  381.300000        b        False14  1599177602296  381.310000        b        False15  1599177602312  381.320000        b        False16  1599177602386  381.330000        b        False17  1599177602388  381.265802        s         True
随时随地看视频慕课网APP

相关分类

Python
我要回答