猿问

如何在 python 中删除数据框中单词的精确匹配?

假设以下数据框有一列名为 <game>:


df:

   game

0  juegos blue

1  juego red

2  juegos yellow

我想从以下停用词列表中删除这些词:


stopWords = ['juego','juegos']

并且期望的结果是:


df:

   game

0  blue

1  red

2  yellow

我试过了:


df['game'] = df['game'].str.replace("|".join(stopWords ), " ")

该函数有效,但它从条目“juegos”中删除了“juego”,只留下一个“s”:


df:

   game

0  s blue

1   red

2  s yellow

有没有办法只在完全匹配的情况下删除单词?


开心每一天1111
浏览 103回答 2
2回答

www说

你可以用 pandas DataFrame.replace() 来做In [1]: import pandas as pd&nbsp;&nbsp; &nbsp;...: df = pd.DataFrame({'game': ['juegos blue', 'juego red', 'juegos yellow']})&nbsp;&nbsp; &nbsp;...: stop_words = [r'juego\b', r'juegos\b']&nbsp;&nbsp; &nbsp;...: df.replace(to_replace={'game': '|'.join(stop_words)}, value='', regex=True, inplace=True)&nbsp;&nbsp; &nbsp;...: df&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;&nbsp;Out[1]:&nbsp;&nbsp; &nbsp; &nbsp; game0&nbsp; &nbsp; &nbsp;blue1&nbsp; &nbsp; &nbsp; red2&nbsp; &nbsp;yellowIn [2]: df = pd.DataFrame({'game': ['juegos blue', 'juego red', 'juegos yellow']})&nbsp;&nbsp; &nbsp;...: stop_words = [r'juego\b']&nbsp;&nbsp; &nbsp;...: df.replace(to_replace={'game': '|'.join(stop_words)}, value='', regex=True, inplace=True)&nbsp;&nbsp; &nbsp;...: df&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;&nbsp;Out[2]:&nbsp;&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; game0&nbsp; &nbsp; juegos blue1&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; red2&nbsp; juegos yellow假设 stop 'words' 以单词 boundary 结尾\b。

明月笑刀无情

Python 字符串替换不起作用,但正则表达式模块可以。您将需要向字符串添加一些标记以使正则表达式查找完整的单词。例如,您可能知道它是一个完整的单词,因为它后面跟有句号.、逗号,或任何类型的空格\s,或结尾行$。\b是单词边界的正则表达式模式。import res1 = df['game'].strfor sw in stopWords:    s1 = re.sub(r'{0}\b'.format(sw), '', s1)df['game'].str = s1保留旧代码以备不时之需。此方法还会直接删除匹配词后的空格、逗号或句点,这不是您要求的,但可能会有用。import res1 = df['game'].strfor sw in stopWords:    s1 = re.sub(r'{0}([.,\s]|$)'.format(sw), '', s1)df['game'].str = s1
随时随地看视频慕课网APP

相关分类

Python
我要回答