首页 > 解决方案 > 正则表达式 x.group()

问题描述

请逐步告知导致结果的步骤,其中包括以下问题。谢谢!

df['text'].str.replace(r'(\w+day\b)', lambda x: x.groups()[0][:3])

  1. 的转变是Series.str什么?我无法检查它。
  2. 里面是什么,x做什么。x.groups()groups()
  3. 为什么[0]x.groups()[0][3]

给出下面的数据框,df

0   Monday: The doctor's appointment is at 2:45pm.
1   Tuesday: The dentist's appointment is at 11:30...
2   Wednesday: At 7:00pm, there is a basketball game!
3   Thursday: Be back home by 11:15 pm at the latest.
4   Friday: Take the train at 08:10 am, arrive at ...

上面的代码转换

0          Mon: The doctor's appointment is at 2:45pm.
1       Tue: The dentist's appointment is at 11:30 am.
2          Wed: At 7:00pm, there is a basketball game!
3         Thu: Be back home by 11:15 pm at the latest.
4    Fri: Take the train at 08:10 am, arrive at 09:...
Name: text, dtype: object

标签: pandasre

解决方案


作为对@AnuragDabas 评论的补充,这里是使用 pythonre模块进行处理的细分:

>>> import re
>>> s = "Monday: The doctor's appointment is at 2:45pm."

>>> re.search(r'(\w+day\b)', s) # find any word ending in "day"
<re.Match object; span=(0, 6), match='Monday'>

>>> re.search(r'(\w+day\b)', s).groups() # get the matching groups
('Monday',)

>>> re.search(r'(\w+day\b)', s).groups()[0] # take the first element
'Monday'

>>> re.search(r'(\w+day\b)', s).groups()[0][:3] # get the first 3 characters
'Mon'

当在 的上下文中使用时pandas.Series.str.replace,它将 传递lambdare.sub函数(如文档中所定义)并使用输出作为匹配的替换(因此“ABCDEFday”被替换为“ABC”)。

的第二个参数说明.str.replace

repl: str or callable

    Replacement string or a callable. The callable is passed the regex match object and must return a replacement string to be used. See re.sub().

注意。正则表达式在处理任何以 w结尾的单词的方式上存在缺陷day。因此,如果一行包含例如Saturday: this is my birthday and not a workday!,这将给出Sat: this is my bir and not a wor!


推荐阅读