python - 如何在Python中所有出现特定模式的数据框中拆分字符串
问题描述
我有以下数据框,其中包含许多作者及其所属机构。 数据框_之前
在隶属关系栏中,有一个模式'Department of ...',我需要为每个作者拆分这个模式。请注意,每一行(作者)的这种模式可以出现多次。我需要为每个作者拆分所有“...的部门”模式,并将其存储在分配给该作者的单独列或行中。(我需要在 Python 中完成。)下图显示了预期的结果。 预期结果
我将不胜感激任何帮助。
解决方案
为了便于分离和随后分配给新列,您可以使用extractall
which 返回行multiindex
,可以很容易地在列中重新排列unstack
。
用作data.csv的输入
Author_ID,Affiliation
6504356384,"Department of Cell and Developmental Biology, University of Michigan, Ann Arbor, Ml 48109, United States, Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, Ml 48109, United States"
57194644787,"Department of Pathology and Immunology, Baylor College of Medicine, Houston, TX, United States, Texas Children's Microbiome Center, Texas Children's Hospital, Houston, TX, United States, Department of Pathology, Texas Children's Ho:"
57194687826,"Department of Biochemistry, Schulich School of Medicine and Dentistry, Western University, London, ON N6A 2C1, Canada, Department of Computer Science, Faculty of Science, Western University, London, ON N6A 2C1, Canada, Depart"
123456789,"Department of RegexTest, Address and Numbers, Department of RegexTest, Faculty of Patterns, Department of RegexTest, Department of RegexTest, City and Place"
import pandas as pd
df = pd.read_csv("data.csv")
print(df)
dept_names = df["Affiliation"].str.extractall(r"(Department of .*?),")
affdf = dept_names.unstack()[0].add_prefix("Affiliation")
affdf = affdf.rename_axis("", axis="columns")
affdf.insert(0, "Author_ID", df["Author_ID"])
print(affdf)
来自affdf的输出
Author_ID Affiliation0 Affiliation1 Affiliation2 Affiliation3
0 6504356384 Department of Cell an... Department of Computa... NaN NaN
1 57194644787 Department of Patholo... Department of Pathology NaN NaN
2 57194687826 Department of Biochem... Department of Compute... NaN NaN
3 123456789 Department of RegexTest Department of RegexTest Department of RegexTest Department of RegexTest
推荐阅读
- javascript - 如何使用纯 JavaScript(无 JQuery)在两个 div 之间放置一个 div
- java - Java 优雅地按数组拆分字符串
- django - Django 频道 - 在连接时发送数据
- c# - 用于检查 C# 中可能存在的内存泄漏的方法的单元测试
- kubernetes - 错误:打开 $HOME/config.lock:文件存在
- java - Java 日期时间格式 AM/PM 配置
- c++ - 如何编写具有增强精神的“c like if”解析器
- c - 努力编写 LFO
- javascript - Discord.js 我如何向新成员加入公会发送欢迎消息?
- javascript - 相同类型的嵌套 Mongoose Schema