猿问

IndexError:在删除行的 DataFrame 上工作时,位置索引器超出范围

IndexError: positional indexers are out-of-bounds在已删除行但不在全新DataFrame 上的 DataFrame 上运行以下代码时出现错误:


我正在使用以下方法来清理数据:


import pandas as pd


def get_list_of_corresponding_projects(row: pd.Series, df: pd.DataFrame) -> list:

    """Returns a list of indexes indicating the 'other' (not the current one) records that are for the same year, topic and being a project.

    """

    current_index = row.name

    current_year = row['year']

    current_topic = row['topic']


    if row['Teaching Type'] == "Class":

        mask = (df.index != current_index) & (df['year'] == current_year) & (df['topic'] == current_topic) & (df['Teaching Type'] == "Project")

        return df[mask].index.values.tolist()

    else:

        return list()



def fix_classes_with_corresponding_projects(df: pd.DataFrame) -> pd.DataFrame:

    """Change the Teaching Type of projects having a corresponding class from 'Project' to 'Practical Work'

    """


    # find the projects corresponding to that class

    df['matching_lines'] = df.apply(lambda row: get_list_of_corresponding_projects(row, df), axis=1)


    # Turn the series of lists into a single list without duplicates

    indexes_to_fix = list(set(sum(df['matching_lines'].values.tolist(), [])))


    # Update the records

    df.iloc[indexes_to_fix, df.columns.get_loc('Teaching Type')] = "Practical Work"


    # Remove the column that was used for tagging

    df.drop(['matching_lines'], axis=1, inplace=True)


    # return the data

    return df

在全新的DataFrame上运行时,这些方法可以正常工作:


df = pd.DataFrame({'year': ['2015','2015','2015','2016','2016','2017','2017','2017','2017'],

                   'Teaching Type':['Class', 'Project', 'Class', 'Class', 'Project', 'Class', 'Class', 'Class', 'Project' ],

                   'topic': ['a', 'a', 'b', 'a', 'c','a','b','a','a']})

display(df)


df = fix_classes_with_corresponding_projects(df)

display(df)


上面的示例在以下行中受到影响:


df.iloc[indexes_to_fix, df.columns.get_loc('Teaching Type')] = "Practical Work"

我在这里想念什么?我认为,当我使用索引值时,我可以避免这种类型的错误。


慕姐8265434
浏览 374回答 1
1回答

元芳怎么了

您的fix_classes_with_corresponding_projects函数存在逻辑缺陷:indexes_to_fix包含要修复的行的索引值(而不是索引位置)。然后使用 选择iloc,它按位置选择行。你需要的是# Update the records df.loc[indexes_to_fix, 'Teaching Type'] = "Practical Work"代替df.iloc[indexes_to_fix, df.columns.get_loc('Teaching Type')] = "Practical Work"所以你的原始代码只是巧合。如果您有一个非数字索引(例如,使用创建示例数据框index=list('abcdefghi')),该缺陷将立即变得明显。
随时随地看视频慕课网APP

相关分类

Python
我要回答