猿问

如何将函数应用于两列Pandas数据帧

如何将函数应用于两列Pandas数据帧

假设我有一个df列'ID', 'col_1', 'col_2'。我定义了一个函数:


f = lambda x, y : my_function_expression。


现在我想应用fto df的两列'col_1', 'col_2'来逐元素地计算一个新列'col_3',有点像:


df['col_3'] = df[['col_1','col_2']].apply(f)  

# Pandas gives : TypeError: ('<lambda>() takes exactly 2 arguments (1 given)'

怎么做 ?


** 添加详细示例如下 ***


import pandas as pd


df = pd.DataFrame({'ID':['1','2','3'], 'col_1': [0,2,3], 'col_2':[1,4,5]})

mylist = ['a','b','c','d','e','f']


def get_sublist(sta,end):

    return mylist[sta:end+1]


#df['col_3'] = df[['col_1','col_2']].apply(get_sublist,axis=1)

# expect above to output df as below 


  ID  col_1  col_2            col_3

0  1      0      1       ['a', 'b']

1  2      2      4  ['c', 'd', 'e']

2  3      3      5  ['d', 'e', 'f']


慕妹3146593
浏览 635回答 3
3回答

慕妹3242003

这是一个使用apply数据帧的示例,我正在调用它axis = 1。注意区别在于,不是尝试将两个值传递给函数f,而是重写函数以接受pandas Series对象,然后索引Series以获取所需的值。In [49]: dfOut[49]:&nbsp;&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; 0&nbsp; &nbsp; &nbsp; &nbsp; &nbsp;10&nbsp; 1.000000&nbsp; 0.0000001 -0.494375&nbsp; 0.5709942&nbsp; 1.000000&nbsp; 0.0000003&nbsp; 1.876360 -0.2297384&nbsp; 1.000000&nbsp; 0.000000In [50]: def f(x):&nbsp; &nbsp;&nbsp;&nbsp; &nbsp;....:&nbsp; return x[0] + x[1]&nbsp;&nbsp;&nbsp; &nbsp;....:&nbsp;&nbsp;In [51]: df.apply(f, axis=1) #passes a Series object, row-wiseOut[51]:&nbsp;0&nbsp; &nbsp; 1.0000001&nbsp; &nbsp; 0.0766192&nbsp; &nbsp; 1.0000003&nbsp; &nbsp; 1.6466224&nbsp; &nbsp; 1.000000根据您的使用情况,创建一个pandas group对象,然后apply在该组上使用有时会很有帮助。

千万里不及你

一个简单的解决方案是df['col_3']&nbsp;=&nbsp;df[['col_1','col_2']].apply(lambda&nbsp;x:&nbsp;f(*x),&nbsp;axis=1)

梵蒂冈之花

在熊猫中有一种干净,单行的方式:df['col_3'] = df.apply(lambda x: f(x.col_1, x.col_2), axis=1)这允许f是具有多个输入值的用户定义函数,并使用(安全)列名而不是(不安全)数字索引来访问列。数据示例(基于原始问题):import pandas as pddf = pd.DataFrame({'ID':['1', '2', '3'], 'col_1': [0, 2, 3], 'col_2':[1, 4, 5]})mylist = ['a', 'b', 'c', 'd', 'e', 'f']def get_sublist(sta,end):&nbsp; &nbsp; return mylist[sta:end+1]df['col_3'] = df.apply(lambda x: get_sublist(x.col_1, x.col_2), axis=1)产量print(df):&nbsp; ID&nbsp; col_1&nbsp; col_2&nbsp; &nbsp; &nbsp; col_30&nbsp; 1&nbsp; &nbsp; &nbsp; 0&nbsp; &nbsp; &nbsp; 1&nbsp; &nbsp; &nbsp;[a, b]1&nbsp; 2&nbsp; &nbsp; &nbsp; 2&nbsp; &nbsp; &nbsp; 4&nbsp; [c, d, e]2&nbsp; 3&nbsp; &nbsp; &nbsp; 3&nbsp; &nbsp; &nbsp; 5&nbsp; [d, e, f]
随时随地看视频慕课网APP

相关分类

Python
我要回答