我有一个CSV文件,新闻.csv,其中包含许多数据。我想检查该行是否包含任何年份,如果是,则为 1,否则为 0。这也适用于百分比,如果行包含百分比,则返回 1,否则为 0。并且还要提取它们。
以下是到目前为止我的代码。我遇到错误(值错误:通过的项目数量错误2,放置意味着1),当我尝试提取百分比
news=pd.read_csv("news.csv")
news['year']= news['STORY'].str.extract(r'(?!\()\b(\d+){1}')
news["howmanyyear"] = news["STORY"].str.count(r'(?!\()\b(\d+){1}')
news["existyear"] = news["howmany"] != 0
news["existyear"] = news["existyear"].astype(int)
news['percentage']= news['STORY'].str.extract(r'(\s100|\s\d{1})(\.\d+)+%')
news.to_csv('news.csv')
提取年份的代码似乎有效,但是,它也提取普通数字,并且只提取其中一个年份。
我的 CSV 文件示例
ID STORY
1 There are a total of 2,070 people died in 2001 due to the virus
2 20% of people in the village have diabetes in 2007
3 About 70 percent of them still believe the rumor
4 In 2003 and 2020, the pneumonia pandemic spread in the world
以下是我想要的输出:
ID STORY existyear year existpercentage percentage
1 There are a total of 2,070 people died in 2001 due to the virus 1 2001 0 -
2 20% of people in the village have diabetes in 2007 1 2007 1 20%
3 About 70 percent of them still believe the rumor 0 - 1 70
4 In 2003 and 2020, the pneumonia pandemic spread in the world 1 2003,2020 0 -
MYYA
相关分类