如何比较数据框中同一列的数据

4回答

隔江千里

我的策略是使用数据透视表。假设没有两行具有相同的（“国家/地区”，“年份”）对。在此假设下，aggfunc=np.sum代表唯一的单一PIB值。table = pd.pivot_table(df, values='PIB', index=['country'],                     columns=['year'], aggfunc=np.sum)[[2002,2007]]                     list(table[table[2002] > table[2007]].index)数据透视表看起来像这样：

0 0

白猪掌柜的

我建议Series按country列创建索引，但必须在具有相同索引值的系列中2007和2002比较系列中使用相同数量的国家：df = pd.DataFrame({'country': ['Afganistan', 'Zimbabwe', 'Afganistan', 'Zimbabwe'], 'PIB': [200, 200, 100, 300], 'year': [2002, 2002, 2007, 2007]})print (df) country PIB year0 Afganistan 200 20021 Zimbabwe 200 20022 Afganistan 100 20073 Zimbabwe 300 2007df = df.set_index('country')print (df) PIB yearcountry Afganistan 200 2002Zimbabwe 200 2002Afganistan 100 2007Zimbabwe 300 2007df1 = df.pivot('country','year','PIB')print (df1)year 2002 2007country Afganistan 200 100Zimbabwe 200 300countries = df1.index[df1[2007] < df1[2002]]print (countries)Index(['Afganistan'], dtype='object', name='country')s1 = df.loc[df.year == 2007, 'PIB'] s2 = df.loc[df.year == 2002, 'PIB']print (s1)countryAfganistan 100Zimbabwe 300Name: PIB, dtype: int64print (s2)countryAfganistan 200Zimbabwe 200Name: PIB, dtype: int64countries = s1.index[s1 < s2]print (countries)Index(['Afganistan'], dtype='object', name='country')另一个想法是首先按年份进行旋转DataFrame.pivot，然后按年份选择列并与中的索引进行比较boolean indexing：df1 = df.pivot('country','year','PIB')print (df1)year 2002 2007country Afganistan 200 100Zimbabwe 200 300countries = df1.index[df1[2007] < df1[2002]]print (countries)Index(['Afganistan'], dtype='object', name='country')

0 0

尚方宝剑之说

这是我的数据框：df = pd.DataFrame([    {"country": "a", "PIB": 2, "year": 2002},    {"country": "b", "PIB": 2, "year": 2002},    {"country": "a", "PIB": 1, "year": 2007},    {"country": "b", "PIB": 3, "year": 2007},])如果我过滤 2002 年和 2007 年这两年，我得到了。df_2002 = df[df["year"] == 2007]out :   country  PIB  year0       a    2  20021       b    2  2002df_2007 = df[df["year"] == 2007]out :   country  PIB  year2       a    1  20073       b    3  2007您想要比较每个国家/地区 PIB 的演变。Pandas 不知道这一点，它尝试比较值，但这里基于相同的索引。Witch不是你想要的，也是不可能的，因为索引不一样。所以你只需要使用set_index()df.set_index("country",  inplace=True)df_2002 = df[df["year"] == 2007]out :          PIB  yearcountry           a          1  2007b          3  2007df_2007 = df[df["year"] == 2007]out :          PIB  yearcountry           a          2  2002b          2  2002现在你可以进行比较df_2002.PIB > df_2007.PIBout:countrya     Trueb    FalseName: PIB, dtype: bool# to get the list of countries(df_2002.PIB > df_2007.PIB)[res == True].index.values.tolist()out : ['a']

0 0

繁华开满天机

试试这个（考虑到您只需要这些国家/地区的列表）：[i for i in df.country if df[(df.country==i) & (df.year==2007)].PIB.iloc[0] < df[(df.country==i) & (df.year==2002)].PIB.iloc[0]]

0 0