首页 > 解决方案 > pandas str 中的正则表达式替换以排除部分匹配

问题描述

我正在尝试用 111 替换 'hi' 和 'hello' 但被 pandas.str.replace() 卡住了。有什么建议么?谢谢!

a1 = pd.Series('12:04:25 Roberts: Hi, Hello, hi this hi')


## it will replace 'this' too using the re below
a1.str.replace('(hello|hi)', '111', regex=True, flags=re.IGNORECASE)
-- 12:04:25 Roberts: 111, 111, 111 t111s 111

## if I set '^hi$' then 'Hi' will be keeped
a1.str.replace('(hello|^hi$)', '111', regex=True, flags=re.IGNORECASE)
-- 12:04:25 Roberts: Hi, 111, hi this hi

## taking space and comma into consideration still the same
a1.str.replace('(hello|^\s?hi,?$)', '111', regex=True, flags=re.IGNORECASE)
-- 12:04:25 Roberts: Hi, 111, hi this hi


标签: pythonregexpandasstring

解决方案


您可以尝试添加一个lookbehind:

>>> a1.str.replace('(?<=\s|,)(hello|hi)', '111', regex=True, flags=re.IGNORECASE)
0    12:04:25 Roberts: 111, 111, 111 this 111
dtype: object
>>> 

推荐阅读