在 Python 中寻找最有利可图的多头/空头对排列

这在技术上不是答案，因为它没有使用优化技术解决，但希望有人会发现它有用。从测试来看，DataFrame 的构建和连接是缓慢的部分。使用 Numpy 创建配对价格矩阵非常快：arr = df['profit'].values + df['profit'].multiply(-1).values[:, None]生成每个节点乘以每个节点的矩阵：+---+-------------+------------+------------+------------+|   | 0           | 1          | 2          | 3          |+---+-------------+------------+------------+------------+| 0 | 0.000000    | 149.635831 | 78.598163  | 101.525670 |+---+-------------+------------+------------+------------+| 1 | -149.635831 | 0.000000   | -71.037668 | -48.110161 |+---+-------------+------------+------------+------------+| 2 | -78.598163  | 71.037668  | 0.000000   | 22.927507  |+---+-------------+------------+------------+------------+| 3 | -101.525670 | 48.110161  | -22.927507 | 0.000000   |+---+-------------+------------+------------+------------+number of nodes如果您构造一个维度为*的空 numpy 数组number of nodes，那么您可以简单地将 daily 数组添加到 totals 数组中：total_arr = np.zeros((4, 4))# Do this for each dayarr = df['profit'].values + df['profit'].multiply(-1).values[:, None]total_arr += arr一旦你有了它，你需要做一些 Pandas voodoo 将节点名称分配给矩阵并将矩阵分解为单独的多/空/利润行。我最初的（详尽的）搜索用了 47 分钟和 60 天的数据。现在已经缩短到 13 秒。完整的工作示例：profits = [    {'date':'2019-11-18', 'node':'A', 'profit': -79.629698},    {'date':'2019-11-19', 'node':'A', 'profit': -17.452517},    {'date':'2019-11-20', 'node':'A', 'profit': -19.069558},    {'date':'2019-11-21', 'node':'A', 'profit': -66.061564},    {'date':'2019-11-18', 'node':'B', 'profit': -87.698670},    {'date':'2019-11-19', 'node':'B', 'profit': -73.812616},    {'date':'2019-11-20', 'node':'B', 'profit': 198.513246},    {'date':'2019-11-21', 'node':'B', 'profit': -69.579466},    {'date':'2019-11-18', 'node':'C', 'profit': 66.3022870},    {'date':'2019-11-19', 'node':'C', 'profit': -16.132065},    {'date':'2019-11-20', 'node':'C', 'profit': -123.73898},    {'date':'2019-11-21', 'node':'C', 'profit': -30.046416},    {'date':'2019-11-18', 'node':'D', 'profit': -131.68222},    {'date':'2019-11-19', 'node':'D', 'profit': 13.2964730},    {'date':'2019-11-20', 'node':'D', 'profit': 23.5950530},    {'date':'2019-11-21', 'node':'D', 'profit': 14.1030270},]# Initialize a Numpy array of node_length * node_length dimensionprofits_df = pd.DataFrame(profits)nodes = profits_df['node'].unique()total_arr = np.zeros((len(nodes), len(nodes)))# For each date, calculate the pairs profit matrix and add it to the totalfor date, date_df in profits_df.groupby('date'):    df = date_df[['node', 'profit']].reset_index()    arr = df['profit'].values + df['profit'].multiply(-1).values[:, None]    total_arr += arr# This will label each column and rownodes_series = pd.Series(nodes, name='node')perms_df = pd.concat((nodes_series, pd.DataFrame(total_arr, columns=nodes_series)), axis=1)# This collapses our matrix back to long, short, and profit rows with the proper column namesperms_df = perms_df.set_index('node').unstack().to_frame(name='profit').reset_index()perms_df = perms_df.rename(columns={'level_0': 'long', 'node': 'short'})# Get rid of long/short pairs where the nodes are the same (not technically necessary)perms_df = perms_df[perms_df['long'] != perms_df['short']]# Let's see our profitperms_df.sort_values('profit', ascending=False)结果：+----+------+-------+-------------+|    | long | short | profit      |+----+------+-------+-------------+| 4  | B    | A     | 149.635831  |+----+------+-------+-------------+| 12 | D    | A     | 101.525670  |+----+------+-------+-------------+| 8  | C    | A     | 78.598163   |+----+------+-------+-------------+| 6  | B    | C     | 71.037668   |+----+------+-------+-------------+| 7  | B    | D     | 48.110161   |+----+------+-------+-------------+| 14 | D    | C     | 22.927507   |+----+------+-------+-------------+| 11 | C    | D     | -22.927507  |+----+------+-------+-------------+| 13 | D    | B     | -48.110161  |+----+------+-------+-------------+| 9  | C    | B     | -71.037668  |+----+------+-------+-------------+| 2  | A    | C     | -78.598163  |+----+------+-------+-------------+| 3  | A    | D     | -101.525670 |+----+------+-------+-------------+| 1  | A    | B     | -149.635831 |+----+------+-------+-------------+感谢 sammywemmy 帮助我整理问题并提出一些有用的东西。

在 Python 中寻找最有利可图的多头/空头对排列 - 一个优化问题？

2回答