加速 Pandas:查找满足一组条件的所有列

我有使用 Pandas DataFrame 表示的数据,例如如下所示:


| id | entity | name | value | location

其中id是一个integer值、entity是一个integer、name是一个string、value是一个integer、和location是一个string(例如美国、加拿大、英国等)。


现在,我想向此数据框中添加一个新列,即列“ flag”,其中的值分配如下:


for d in df.iterrows():


    if d.entity == 10 and d.value != 1000 and d.location == CA:

        d.flag = "A" 

    elif d.entity != 10 and d.entity != 0 and d.value == 1000 and d.location == US:

        d.flag = "C"

    elif d.entity == 0 and d.value == 1000 and d.location == US"

        d.flag = "B"

    else:

        print("Different case")

有没有办法加快速度并使用一些内置函数而不是 for 循环?


侃侃尔雅
浏览 294回答 3
3回答

郎朗坤

np.select根据您给它选择的那些条件,使用您传递条件列表的哪个,并且您可以在不满足任何条件时指定默认值。conditions = [    (d.entity == 10) & (d.value != 1000) & (d.location == 'CA'),    (d.entity != 10) & (d.entity != 0) & (d.value == 1000) & (d.location == 'US'),    (d.entity == 0) & (d.value == 1000) & (d.location == 'US')]choices = ["A", "C", "B"]df['flag'] = np.select(conditions, choices, default="Different case")

LEATH

添加()按位and->&用于处理numpy.select:m = [    (d.entity == 10) & (d.value != 1000) & (d.location == 'CA'),    (d.entity != 10) & (d.entity != 0) & (d.value == 1000) & (d.location == 'US'),    (d.entity == 0) & (d.value == 1000) & (d.location == 'US')]df['flag'] = np.select(m, ["A", "C", "B"], default="Different case")

绝地无双

您写了“查找满足一组条件的所有列”,但您的代码显示您实际上是在尝试添加一个新列,其每行的值是根据同一行的其他列的值计算的。如果确实如此,您可以使用df.apply,给它一个计算特定行值的函数:def flag_value(row):    if row.entity == 10 and row.value != 1000 and row.location == CA:        return "A"    elif row.entity != 10 and row.entity != 0 and row.value == 1000 and row.location == US:        return "C"    elif row.entity == 0 and row.value == 1000 and row.location == US:        return "B"    else:        return "Different case"df['flag'] = df.apply(flag_value, axis=1)查看此相关问题以获取更多信息。如果您真的想查找指定某些条件的所有列,使用Pandas 数据框执行此操作的常用方法是使用df.loc和索引:only_a_cases = df.loc[df.entity == 10 & df.value != 1000 & df.location == "CA"]# or:only_a_cases = df.loc[lambda df: df.entity == 10 & df.value != 1000 & df.location == "CA"]
打开App,查看更多内容
随时随地看视频慕课网APP

相关分类

Python