我有一个包含数百列作为客户端ID的数据帧和一行,每个客户端ID的总票证nr,它看起来像这样:(df1是原始csv文件多次转换的结果)
df1
+-----+----+-----+
| 30 | 5 | 100 |
+-----+----+-----+
| 122 | 40 | 13 |
+-----+----+-----+
另一个具有 2 列的数据帧,一列account_id,client_id,如下所示:
df2
+------------+-----------+
| account_id | client_id |
+------------+-----------+
| 4char | 4 |
+------------+-----------+
| 3char | 5 |
+------------+-----------+
| 2char | 30 |
+------------+-----------+
| 16char | 9 |
+------------+-----------+
| 17char | 100 |
+------------+-----------+
我希望有一个包含3列account_id,client_id和total_tickets的单个文件,如下所示:
df
+------------+-----------+---------------+
| account_id | client_id | total_tickets |
+------------+-----------+---------------+
| 4char | 4 | null
+------------+-----------+---------------+
| 3char | 5 | 40
+------------+-----------+---------------+
| 2char | 30 | 122
+------------+-----------+---------------+
| 16char | 9 | null
+------------+-----------+---------------+
| 17char | 100 | 13
+------------+-----------+---------------+
到目前为止,我已经达到了这一点:我已经创建了一个在两个数据帧上迭代()的函数,使用isin()函数检查df2的client_id是否在df1的列中找到,接下来我在df2上添加了一个新列,total_tickets de()函数
f1 = df1, f2 = df2
def populating_df(f1, f2):
for org_nr in f2.iterrows():
for col in f1.iterrows():
matched_org_nr = f2.client_id.isin(f1.columns)
if matched_org_nr.any() == True:
sum_of_tickets_per_col = matched_org_nr
# create a new column in f2 file with the values of total_tickets for each org number matched
f2 = f2.loc[:].assign(Total_Tickets=sum_of_tickets_per_col)
return f2
如果有人对如何解决这个问题有任何建议,我会很高兴
三国纷争
慕丝7291255
慕斯709654
相关分类