首页 > 解决方案 > 使用 Python 加快查找和替换代码的速度?

问题描述

我有一个需要从 pandas 数据框列中删除的 4,000 个字符串的列表。我下面的代码适用于下面的示例,但是当我在我的 20k+ 行的 pandas 数据帧上使用它时,它需要很长时间。关于加快速度的任何想法?

import pandas as pd
import re

df = pd.DataFrame(
    {
        "ID": [1, 2, 3, 4, 5],
        "name": [
            "Hello Sam how is it going today? oh yeah",
            "Hello Jane how is it going today? oh yeah",
            "It is an Hello example how are you doing today?",
            "how is it going today?n[soldjgf   ",
            "how is it going today Hello World",
        ],
    }
)


my_list = ['how is it going today?n[soldjgf', 'how are you doing today?']
# =============================================================================
# 
p = re.compile('|'.join(map(re.escape, my_list)))
df['cleaned_text'] = [p.sub(' ', text) for text in df['name']] 

标签: pythonpandasperformancetext

解决方案


使用 df.str.replace()

p = re.compile('|'.join(map(re.escape, my_list)))

df['cleaned_text'] = df['name'].str.replace(p, ' ')

推荐阅读