猿问

根据 pandas 中的条件由公司创建一个虚拟对象

我有一个 pandas 数据框,如下所示:


data = {"firm": [1, 1, 1, 1, 2, 2, 2, 2, 2, 3, 3, 3, 4, 4, 4, 4, 4, 4], "year" : [2000, 2001, 2002, 2003, 1990, 1991, 1992, 1993, 1994, 2010, 2011, 2012, 2005, 2006, 2007, 2008, 2009, 2010], "var" : [3, 2, 1, 0.5, 5, 3, 2, 0.5, 0.5, 0.5, 0, 0, 8, 5, 3, 0.5, 0.5, 0.5]} 

df = pd.DataFrame(data) 

df

我想为每个公司创建一个虚拟变量,条件如下:


只要变量“var”连续两年等于或小于 0.5,“dummy”就等于 1,因此变量“dummy”如下所示:


data = {"firm": [1, 1, 1, 1, 2, 2, 2, 2, 2, 3, 3, 3, 4, 4, 4, 4, 4, 4], "year" : [2000, 2001, 2002, 2003, 1990, 1991, 1992, 1993, 1994, 2010, 2011, 2012, 2005, 2006, 2007, 2008, 2009, 2010], "var" : [3, 2, 1, 0.5, 5, 3, 2, 0.5, 0.5, 0.5, 0, 0, 8, 5, 3, 0.5, 0.5, 0.5], "dummy" : [0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 1, 0, 0, 0, 0, 1, 1]} 

df = pd.DataFrame(data) 

df

最好的方法是什么?


犯罪嫌疑人X
浏览 92回答 3
3回答

墨色风雨

您可以只移动,检查阈值并与原始系列的检查结合起来:df.groupby('firm')['var'].shift().le(.5) & df['var'].le(.5)这应该比 稍快一些groupby().apply。另一种方法(在您需要检查几年的情况下更好)是rolling:df['dummy'] = df.groupby('firm')['var'].transform(lambda x: x.rolling(2).max().le(.5))输出:0     False1     False2     False3     False4     False5     False6     False7     False8      True9     False10     True11     True12    False13    False14    False15    False16     True17     TrueName: var, dtype: bool

慕田峪7331174

您的需求几乎可以直接转换为 pandas。首先groupby坚定,然后检查您的条件是否满足apply。你可以得到下一年shiftimport pandas as pddata = {"firm": [1, 1, 1, 1, 2, 2, 2, 2, 2, 3, 3, 3, 4, 4, 4, 4, 4, 4], "year" : [2000, 2001, 2002, 2003, 1990, 1991, 1992, 1993, 1994, 2010, 2011, 2012, 2005, 2006, 2007, 2008, 2009, 2010], "var" : [3, 2, 1, 0.5, 5, 3, 2, 0.5, 0.5, 0.5, 0, 0, 8, 5, 3, 0.5, 0.5, 0.5]}&nbsp;df = pd.DataFrame(data)# Solutiondf['dummy'] = df.groupby('firm')['var'].apply(lambda x: (x.shift() <= .5) & (x <= .5)).view('i1')print(df)出去:&nbsp; &nbsp; firm&nbsp; year&nbsp; var&nbsp; dummy0&nbsp; &nbsp; &nbsp; 1&nbsp; 2000&nbsp; 3.0&nbsp; &nbsp; &nbsp; 01&nbsp; &nbsp; &nbsp; 1&nbsp; 2001&nbsp; 2.0&nbsp; &nbsp; &nbsp; 02&nbsp; &nbsp; &nbsp; 1&nbsp; 2002&nbsp; 1.0&nbsp; &nbsp; &nbsp; 03&nbsp; &nbsp; &nbsp; 1&nbsp; 2003&nbsp; 0.5&nbsp; &nbsp; &nbsp; 04&nbsp; &nbsp; &nbsp; 2&nbsp; 1990&nbsp; 5.0&nbsp; &nbsp; &nbsp; 05&nbsp; &nbsp; &nbsp; 2&nbsp; 1991&nbsp; 3.0&nbsp; &nbsp; &nbsp; 06&nbsp; &nbsp; &nbsp; 2&nbsp; 1992&nbsp; 2.0&nbsp; &nbsp; &nbsp; 07&nbsp; &nbsp; &nbsp; 2&nbsp; 1993&nbsp; 0.5&nbsp; &nbsp; &nbsp; 08&nbsp; &nbsp; &nbsp; 2&nbsp; 1994&nbsp; 0.5&nbsp; &nbsp; &nbsp; 19&nbsp; &nbsp; &nbsp; 3&nbsp; 2010&nbsp; 0.5&nbsp; &nbsp; &nbsp; 010&nbsp; &nbsp; &nbsp;3&nbsp; 2011&nbsp; 0.0&nbsp; &nbsp; &nbsp; 111&nbsp; &nbsp; &nbsp;3&nbsp; 2012&nbsp; 0.0&nbsp; &nbsp; &nbsp; 112&nbsp; &nbsp; &nbsp;4&nbsp; 2005&nbsp; 8.0&nbsp; &nbsp; &nbsp; 013&nbsp; &nbsp; &nbsp;4&nbsp; 2006&nbsp; 5.0&nbsp; &nbsp; &nbsp; 014&nbsp; &nbsp; &nbsp;4&nbsp; 2007&nbsp; 3.0&nbsp; &nbsp; &nbsp; 015&nbsp; &nbsp; &nbsp;4&nbsp; 2008&nbsp; 0.5&nbsp; &nbsp; &nbsp; 016&nbsp; &nbsp; &nbsp;4&nbsp; 2009&nbsp; 0.5&nbsp; &nbsp; &nbsp; 117&nbsp; &nbsp; &nbsp;4&nbsp; 2010&nbsp; 0.5&nbsp; &nbsp; &nbsp; 1

炎炎设计

让我们尝试groupby一下shiftdf.groupby('firm')['var'].apply(lambda x : x.shift().le(0.5) & x.le(0.5))0&nbsp; &nbsp; &nbsp;False1&nbsp; &nbsp; &nbsp;False2&nbsp; &nbsp; &nbsp;False3&nbsp; &nbsp; &nbsp;False4&nbsp; &nbsp; &nbsp;False5&nbsp; &nbsp; &nbsp;False6&nbsp; &nbsp; &nbsp;False7&nbsp; &nbsp; &nbsp;False8&nbsp; &nbsp; &nbsp; True9&nbsp; &nbsp; &nbsp;False10&nbsp; &nbsp; &nbsp;True11&nbsp; &nbsp; &nbsp;True12&nbsp; &nbsp; False13&nbsp; &nbsp; False14&nbsp; &nbsp; False15&nbsp; &nbsp; False16&nbsp; &nbsp; &nbsp;True17&nbsp; &nbsp; &nbsp;TrueName: var, dtype: bool
随时随地看视频慕课网APP

相关分类

Python
我要回答