首页 > 解决方案 > Extract specific words from string

问题描述

I have a Dataframe like this:

Column_A
1. A lot of text inhere, but I want all words that have a comma in the middle. Like this: hello,world. A string can contain multiple relevant words, like hello,python and we have also many                         whit                spaces              in          the text   
2. What I want is to abstract,all words with that pattern. Not sure if it has an impact, but some parts of the strings containing "this signs". or "this,signs"                                     thanks  for helpingme                    greets! 

Desired outcome:

hello,world
hello,python
abstract,all
"this,signs"

I tried to do this with this code:

df['B'] = df['Column_A'].str.findall(r',').str.join(' ').str.strip()

But that is giving me not the desired outcome.

标签: pythonregexstringpandas

解决方案


鉴于预期输出的特定格式,您似乎可以使用:

from itertools import chain

l = chain.from_iterable(df.Column_a.str.findall(r'\w+,\w+').values.tolist())
pd.Dataframe(l, columns=['Column_A'])

      Column_A
0   hello,world
1  hello,python
2  abstract,all
3    this,signs

推荐阅读