python - Extract specific words from string
问题描述
I have a Dataframe like this:
Column_A
1. A lot of text inhere, but I want all words that have a comma in the middle. Like this: hello,world. A string can contain multiple relevant words, like hello,python and we have also many whit spaces in the text
2. What I want is to abstract,all words with that pattern. Not sure if it has an impact, but some parts of the strings containing "this signs". or "this,signs" thanks for helpingme greets!
Desired outcome:
hello,world
hello,python
abstract,all
"this,signs"
I tried to do this with this code:
df['B'] = df['Column_A'].str.findall(r',').str.join(' ').str.strip()
But that is giving me not the desired outcome.
解决方案
鉴于预期输出的特定格式,您似乎可以使用:
from itertools import chain
l = chain.from_iterable(df.Column_a.str.findall(r'\w+,\w+').values.tolist())
pd.Dataframe(l, columns=['Column_A'])
Column_A
0 hello,world
1 hello,python
2 abstract,all
3 this,signs
推荐阅读
- linux - 将文本文件拆分成块并保存
- apache - Apache 访问日志指示对我不想“获取”的文件的 GET 请求
- java - 如果存在 task.city,则执行 DDL 更改表时出错
- javascript - 如何在反应中更改 pdf 的高度或宽度(npm react-pdf)
- sql - SQL仅在值与所有先前值不同时获取行号
- javascript - 5天预测在改变城市时不断增加天数
- python - 阻止 selenium/itertools 跳过从 Excel 电子表格输入的行
- matlab - 如何使用 plot3 绘制 3D 矩阵
- django - 如何修复在 Django Rest Framework 中使用 REST API 登录时出现的 CSRF 错误?
- mysql - 线程中未处理的异常由
.wrapper 在 0x10e2d62f0>