使用 Python 加快查找和替换代码的速度?

我有一个需要从 pandas 数据框列中删除的 4,000 个字符串的列表。我下面的代码适用于下面的示例,但是当我在我的 20k+ 行的 pandas 数据帧上使用它时,它需要很长时间。关于加快速度的任何想法?


import pandas as pd

import re


df = pd.DataFrame(

    {

        "ID": [1, 2, 3, 4, 5],

        "name": [

            "Hello Sam how is it going today? oh yeah",

            "Hello Jane how is it going today? oh yeah",

            "It is an Hello example how are you doing today?",

            "how is it going today?n[soldjgf   ",

            "how is it going today Hello World",

        ],

    }

)



my_list = ['how is it going today?n[soldjgf', 'how are you doing today?']

# =============================================================================

p = re.compile('|'.join(map(re.escape, my_list)))

df['cleaned_text'] = [p.sub(' ', text) for text in df['name']] 


杨__羊羊
浏览 91回答 1
1回答

绝地无双

使用 df.str.replace()p = re.compile('|'.join(map(re.escape, my_list)))df['cleaned_text'] = df['name'].str.replace(p, ' ')
打开App,查看更多内容
随时随地看视频慕课网APP

相关分类

Python