使用 if 语句对 df 进行子集 - Pandas

我希望df使用if语句创建并返回子集。具体来说,对于下面的代码,我有两组不同的值。我要返回的df值将根据这些值之一而有所不同。


使用下面的代码,具体值将在normal和内different。中的值place将决定如何对df进行子集化。


下面是我的尝试。in 中的值place永远只是一个值,因此它不会完全匹配列表。df当place这些列表中的值等于单个值时,是否可以返回?


我希望返回df1以用于后续任务。


import pandas as pd


df = pd.DataFrame({

    'period' : [1.0, 1.0, 2.0, 2.0, 3.0, 4.0, 5.0, 7.0, 7.0, 8.0, 9.0],                                

    })


place = 'a'


normal = ['a','b']

different = ['v','w','x','y','z']


different_subset_start = 2

normal_subset_start = 4

subset_end = 8


for val in df:

    if place in different:

        print('place is different')

        df1 = df[(df['period'] >= different_subset_start) & (df['period'] <= subset_end)].drop_duplicates(subset = 'period')

        return df1

    elif place in normal:

        print('place is normal')

        df1 = df[(df['period'] >= normal_subset_start) & (df['period'] <= subset_end)].drop_duplicates(subset = 'period')

        return df1

    else:

        print('Incorrect input for Day. Day Floater could not be scheduled. Please check input value')

    return

打印(df1)


预期的输出将返回df1以供以后使用。


   period

2     2.0

4     3.0

5     4.0

6     5.0

7     7.0

9     8.0


动漫人物
浏览 196回答 2
2回答

素胚勾勒不出你

要检查一个对象是否在某物中而不是检查它是否等于某物,请使用in.if place in different:同样地elif place in normal:编辑:如果你把它变成一个函数,它应该是这样的。基本上,您只需要做一些def my_function_name(arguments):事情,然后缩进其余代码,使其属于该函数。像这样:import pandas as pddef get_subset(df, place):&nbsp; &nbsp; normal = ['a','b']&nbsp; &nbsp; different = ['v','w','x','y','z']&nbsp; &nbsp; different_subset_start = 2&nbsp; &nbsp; normal_subset_start = 4&nbsp; &nbsp; subset_end = 8&nbsp; &nbsp; if place in different:&nbsp; &nbsp; &nbsp; &nbsp; df1 = df[(df['period'] >= different_subset_start) & (df['period'] <= subset_end)].drop_duplicates(subset = 'period')&nbsp; &nbsp; elif place in normal:&nbsp; &nbsp; &nbsp; &nbsp; df1 = df[(df['period'] >= normal_subset_start) & (df['period'] <= subset_end)].drop_duplicates(subset = 'period')&nbsp; &nbsp; else:&nbsp; &nbsp; &nbsp; &nbsp; df1 = None&nbsp; &nbsp; return df1df = pd.DataFrame({&nbsp; &nbsp; 'period' : [1.0, 1.0, 2.0, 2.0, 3.0, 4.0, 5.0, 7.0, 7.0, 8.0, 9.0],&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;&nbsp; &nbsp; })place = 'a'print(get_subset(df, place))

呼如林

看看for val in df:你的代码。这样的结构很奇怪,因为您不使用val变量。将代码的最后一个片段更改为如下所示:def fn():&nbsp; &nbsp; if place in different:&nbsp; &nbsp; &nbsp; &nbsp; print('place is different')&nbsp; &nbsp; &nbsp; &nbsp; return df[df.period.between(different_subset_start, subset_end)]\&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; .drop_duplicates(subset='period')&nbsp; &nbsp; elif place in normal:&nbsp; &nbsp; &nbsp; &nbsp; print('place is normal')&nbsp; &nbsp; &nbsp; &nbsp; return df[df.period.between(normal_subset_start, subset_end)]\&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; .drop_duplicates(subset = 'period')&nbsp; &nbsp; else:&nbsp; &nbsp; &nbsp; &nbsp; print('Incorrect input for place. Please check value')在您的情况下subset = 'period'是多余的,因为period是 DataFrame 中的唯一列。也不需要最后一次返回。如果函数执行到代码末尾,它会返回而不返回任何值。还有一个细节:如果您的DataFrame有一个列,那么一个Series就足够了?
打开App,查看更多内容
随时随地看视频慕课网APP

相关分类

Python