python - 在单元格中的第一个字母之后拆分熊猫数据框列(一分为二)
问题描述
问题
我想将 pandas 数据框中的一列拆分为 2 列,在百分比列(见下文)中,每个条目都以大写字母字符开头,我想在此字母之后立即拆分“百分比”列,使用标有“氨基酸”的新列。
当前代码:
import pandas as pd
df = pd.read_csv('foo.csv')
df['Amino Acid'], df['Percentage'] = zip(*df['Percentage'].map(lambda x: x.split('[^a-zA-Z]')))
df.to_csv('bar.csv',index=False)
输入数据示例
+-----------------------------+-------+-----+-----------+---------------------------------------------------------------------------------------------+
| Species | ID | OGT | DB | Percentage |
+-----------------------------+-------+-----+-----------+---------------------------------------------------------------------------------------------+
| Halogeometricum borinquense | 60847 | 37 | ATCC/DSMZ | E is 8.333003365670164% in ./archaea/GCF_000337855.1/GCF_000337855.1_ASM33785v1_protein.faa |
| Halogeometricum borinquense | 60847 | 37 | ATCC/DSMZ | R is 6.310991522830762% in ./archaea/GCF_000337855.1/GCF_000337855.1_ASM33785v1_protein.faa |
| Halogeometricum borinquense | 60847 | 37 | ATCC/DSMZ | A is 10.22668778459711% in ./archaea/GCF_000337855.1/GCF_000337855.1_ASM33785v1_protein.faa |
+-----------------------------+-------+-----+-----------+---------------------------------------------------------------------------------------------+
所需输出示例
+-----------------------------+-------+-----+-----------+------------+--------------------------------------------------------------------------------------------+
| Species | ID | OGT | DB | Amino Acid | Percentage |
+-----------------------------+-------+-----+-----------+------------+--------------------------------------------------------------------------------------------+
| Halogeometricum borinquense | 60847 | 37 | ATCC/DSMZ | E | is 8.333003365670164% in ./archaea/GCF_000337855.1/GCF_000337855.1_ASM33785v1_protein.faa |
| Halogeometricum borinquense | 60847 | 37 | ATCC/DSMZ | R | is 6.310991522830762% in ./archaea/GCF_000337855.1/GCF_000337855.1_ASM33785v1_protein.faa |
| Halogeometricum borinquense | 60847 | 37 | ATCC/DSMZ | A | is 10.22668778459711% in ./archaea/GCF_000337855.1/GCF_000337855.1_ASM33785v1_protein.faa |
+-----------------------------+-------+-----+-----------+------------+--------------------------------------------------------------------------------------------+
解决方案
您可以直接提取第一个字母:
df['Amino Acid'] = df['Percentage'].str[0]
df['Percentage'] = df['Percentage'].str[1:]
推荐阅读
- python - 如何根据导入的数据制作轨迹的 3D 图动画
- java - 如何编写以 URI 作为输入的 Junit 测试用例?
- javascript - @npm runnig in Vue.js 错误通过节点脚本和节点箱
- wordpress - 在 WooCommerce 存档产品页面上更改分页基础 slug
- flutter - 如何在高度= 130和宽度= 300的情况下制作下面给出的形状的容器
- javascript - 通过单击复选框进行 Ajax 调用 - 执行错误
- angular - 更改 NGRX 数据 additionalCollectionState 属性
- azure - 在 Azure Web Apps for Containers 上禁用自动 URL 取消转义
- css - 如何让 div 至少占用 50% 的空间,但如果没有兄弟姐妹则占用 100%
- .net - .net 核心 iformfile 未捕获 formdata 提交