如何根据条件提取某些行？

首页课程实战体系课手记专栏慕课教程

如何根据条件提取某些行？

我正在使用一个数据集，该数据集的第一列中包含情感或类别标签。但是，由于数据集不平衡，我需要为每个类别提取相同数量的行。也就是说，如果有 10 个类别，我只需从每个类别中选择 100 行样本。结果将是 1000 行样本。

我尝试过的：

def append_new_rows(df, new_df, s):

c = 0

for index, row in df.iterrows():

if s == row[0]:

if c <= 100:

new_df.append(row)

c += 1

return df_2

for s in sorted(list(set(df.category))):

new_df = append_new_rows(df, new_df, s)

数据集

----------------------------

| category | A | B | C | D |

----------------------------

| happy | ...| ...|...|...|

| ... | ...| ...|...|...|

| sadness | ...| ...|...|...|

预期产出

----------------------------

| category | A | B | C | D |

----------------------------

| happy | ...| ...|...|...|

... 100 samples of happy

| ... | ...| ...|...|...|

| sadness | ...| ...|...|...|

... 100 samples of sadness

...

1000 sampple rows

浮云间

浏览 88回答 1

1回答

Helenr

def append_new_df(df, df_2, s, n):    c = 1    for index, row in df.iterrows():        if s == row[0]:            if c <= n:                df_2 = df_2.append(row)                c += 1    return df_2你就在那里，你只需要做这样的事情

0 0

随时随地看视频慕课网APP