python - 如何将搜索到的字符串插入到熊猫的新行中
问题描述
我有一个包含一组特定数据的 Excel 示例:
Sno String
1 ram has a ball
2 karan has a ball
3 raj has a ball
4 seema has a ball
5 raj has a ball raj
从上面的 df 我想提取特定的名称集并将它们写入新列
Sno String names count
1 ram has a ball ram 1
2 karan has a ball karan 2
3 raj has a ball raj 3
4 seema has a ball
5 raj has a ball raj 4
我正在使用的代码:
import pandas as pd
import re
df = pd.read_excel (r'<path>')
array = {'ram','karan','raj'}
count = 0
for index, row in df.iterrows():
sql = row["string"]
for i in array:
str = re.findall(i,sql)
if str:
count = count + 1
else :
continue
df["name"] = pd.Series(str)
df["count"] = pd.Series(count)
path_to_file = r'<path>'
df.to_excel(path_to_file)
代码正在运行,但它只迭代一行而不是为每次迭代写入行值
我得到的输出:
Sno String names count
1 ram has a ball ram 1
2 karan has a ball
3 raj has a ball
4 seema has a ball
5 raj has a ball
谁能帮我解决我的问题?
解决方案
我们可以str.findall
从pandas
df['names'] = df.String.str.findall('|'.join(array)).str[0]
df['cnt'] = df.names.notna().cumsum().mask(df.names.isna())
df
Out[176]:
String names cnt
0 ram has a ball ram 1.0
1 karan has a ball karan 2.0
2 raj has a ball raj 3.0
3 seema has a ball NaN NaN
4 raj has a ball raj 4.0
推荐阅读
- html - 无法抑制 R Markdown 中的错误消息?
- redis - 集群 3x3 时无法写入 redis 副本
- ruby-on-rails - 如何阻止设计尝试重定向到 /users/sign_in 以获取 API 请求?
- typescript - 强制类型注释的大括号位置
- python - 抓取 PDF - 检查指定关键字的词频
- docusignapi - Docusign 如何在 TemplateRole 中添加抄送收件人
- apache-spark - pyspark 在 pyspark 数据框上复制一列
- single-sign-on - OIDC 的 Okta 反向通道注销
- python - 为什么这个 python 代码不起作用(可能的语法错误?)
- arrays - 如何在字符串中使用单词 boundaires 从我的数组中找到一个单词?