首页 > 解决方案 > Using regex to find (and replace) phone number extensions (Python)

问题描述

I'm currently trying to find phone number extensions from pandas series, an example being 'Ext: 123'. The extension can be in the cell either on its own (like previously) or after a phone number, e.g. 123 456 789 / Ext: 4502.

The extensions can also be in varying formats, such as Ex.430 (missing the letter t, no space after punctuation mark. Therefore, I wanted to find all sequences in the series that have 1-3 letters, followed by zero or more symbols, zero or more spaces, followed by 2 to 6 numbers.

Optimally, I would also replace these with the correct format, which is Ext: 32 (can be up to 6 numbers)

Here is my regex so far:

({'\D{1,3}\W*\s*\d{2,6}]'

I have also used other variations, but those didn't work either.

I would appreciate any help, thanks.

标签: pythonregex

解决方案


您可以将列拆分为字母字符(加上冒号)。

df['phones'].str.split(r'[A-Za-z:]+\.?', expand=True)

推荐阅读