我有一个需要从 pandas 数据框列中删除的 4,000 个字符串的列表。我下面的代码适用于下面的示例,但是当我在我的 20k+ 行的 pandas 数据帧上使用它时,它需要很长时间。关于加快速度的任何想法?
import pandas as pd
import re
df = pd.DataFrame(
{
"ID": [1, 2, 3, 4, 5],
"name": [
"Hello Sam how is it going today? oh yeah",
"Hello Jane how is it going today? oh yeah",
"It is an Hello example how are you doing today?",
"how is it going today?n[soldjgf ",
"how is it going today Hello World",
],
}
)
my_list = ['how is it going today?n[soldjgf', 'how are you doing today?']
# =============================================================================
#
p = re.compile('|'.join(map(re.escape, my_list)))
df['cleaned_text'] = [p.sub(' ', text) for text in df['name']]
绝地无双
相关分类