python - Python Pandas 正则表达式特定字符串
问题描述
我想遍历一列记录(字符串目录路径)并提取括号内的记录ID。但是,在其他情况下,括号中的详细信息不是记录 ID,需要忽略。
代码:
df1['Doc ID'] = df['Folder Path'].str.extract('.*\((.*)\).*', expand=True) #this does not ignore instances with (2018-03) or (yyyy-mm)
我也试过:
df1['Doc ID'] = df['Folder Path'].str.extract('\((?!date_format)([^()]+)\)',expand=True) #this does not ignore (Data Only)
Folder Path Doc ID
1 /report/support + admin. (256)/ Global (2018-03) (256) # ignores: (2018-03)
2 /reports/limit/sector(139)/2017 (139)
3 /reports/sector/region(147,189 and 132)/2018 (147, 189 and 132)
4 /reports/support.(Data Only)/Region (2558) (2558) #ignores(Data Only)
解决方案
This uses negative lookahead to filter out "Data Only" and date formats:
(\((?!Data Only)[^\-]+\))
Setup:
df = pd.DataFrame(
{'Path': ['(Data Only) text (1, 2 and 3)',
'(2013-08) foo (123)',
'(Data Only) bar (1,2,3,4,5 and 6)']}
)
Path
0 (Data Only) text (1, 2 and 3)
1 (2013-08) foo (123)
2 (Data Only) bar (1,2,3,4,5 and 6)
Using str.extract
:
df.Path.str.extract(r'(\((?!Data Only)[^\-]+\))', expand=True)
0
0 (1, 2, and 3)
1 (123)
2 (1,2,3,4,5 and 6)
推荐阅读
- windows - Get-WsusComputer 并获取 Windows 更新状态
- flutter - 如何解决“参数类型‘对象’?不能分配给参数类型'String'”?
- reactjs - useState 设置函数未在句柄函数中调用
- sparql - SPARQL:绑定聚合函数的结果不起作用
- java - 从 Java 中的 Google Cloud Storage 访问时损坏的 TAR 文件错误
- sublimetext3 - Sublime Text 选择问题
- dataset - 神经网络估计物体与相机的距离
- c# - 如何在 Blazor 中的页面之间从编辑表单发送值
- python - 如何使用 Python 将直接输入发送到游戏,特别是 Roblox?
- c# - 将对象列表存储到列中并触发邮件 - C#、SQL Server