如何根据条件提取某些行?

我正在使用一个数据集,该数据集的第一列中包含情感或类别标签。但是,由于数据集不平衡,我需要为每个类别提取相同数量的行。也就是说,如果有 10 个类别,我只需从每个类别中选择 100 行样本。结果将是 1000 行样本。


我尝试过的:

def append_new_rows(df, new_df, s):

    c = 0

    for index, row in df.iterrows():

        if s == row[0]:

            if c <= 100:

                new_df.append(row)

                c += 1

    return df_2


for s in sorted(list(set(df.category))):

    new_df = append_new_rows(df, new_df, s)

数据集

----------------------------

| category | A  | B  | C | D |

----------------------------

| happy    | ...| ...|...|...|

| ...      | ...| ...|...|...|

| sadness  | ...| ...|...|...|

预期产出

----------------------------

| category | A  | B  | C | D |

----------------------------

| happy    | ...| ...|...|...|

... 100 samples of happy

| ...      | ...| ...|...|...|

| sadness  | ...| ...|...|...|

... 100 samples of sadness

...

...

1000 sampple rows


浮云间
浏览 81回答 1
1回答

Helenr

def append_new_df(df, df_2, s, n):&nbsp; &nbsp; c = 1&nbsp; &nbsp; for index, row in df.iterrows():&nbsp; &nbsp; &nbsp; &nbsp; if s == row[0]:&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; if c <= n:&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; df_2 = df_2.append(row)&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; c += 1&nbsp; &nbsp; return df_2你就在那里,你只需要做这样的事情
打开App,查看更多内容
随时随地看视频慕课网APP

相关分类

Python