python - 从 excel/csv 表中提取特定值以在 Python 中形成另一个表
问题描述
我有一张带有下表的 excel 表。
Sort,Realm,Subrealm,Kingdom,Subkingdom,Phylum,Subphylum,Class,Subclass,Order,Suborder,Family
1,,,Virus,,Negarnaviricota,Haploviricotina,Chunqiuviricetes,,Muvirales,,Qinviridae
2,,,Virus,,Negarnaviricota,Haploviricotina,Milneviricetes,,Serpentovirales,,Aspiviridae
3,,,Virus,,Negarnaviricota,Haploviricotina,Monjiviricetes,,Jingchuvirales,,Chuviridae
4,,,Virus,,Negarnaviricota,Polyploviricotina,Ellioviricetes,,Bunyavirales,,Phasmaviridae
5,,,Virus,,,,,,Caudovirales,,Ackermannviridae
6,,,Virus,,,,,,Caudovirales,,Myoviridae
7,,,Virus,,,,,,Caudovirales,,Siphoviridae
我想以预期的结果格式提取数据。
Sort,Name,Rank,Parent
1,Negarnaviricota,Phylum,Virus
1,Haploviricotina,Subphylum,Negarnaviricota
4,Polyploviricotina,Subphylum,Negarnaviricota
1,Chunqiuviricetes,Class,Haploviricotina
2,Milneviricetes,Class,Haploviricotina
3,Monjiviricetes,Class,Haploviricotina
4,Ellioviricetes,Class,Polyploviricotina
1,Muvirales,Order,Chunqiuviricetes
2,Serpentovirales,Order,Milneviricetes
3,Jingchuvirales,Order,Monjiviricetes
4,Bunyavirales,Order,Ellioviricetes
5,Caudovirales,Order,Virus
1,Qinviridae,Family,Muvirales
2,Aspiviridae,Family,Serpentovirales
3,Chuviridae,Family,Jingchuvirales
4,Phasmaviridae,Family,Bunyavirales
5,Ackermannviridae,Family,Caudovirales
6,Myoviridae,Family,Caudovirales
7,Siphoviridae,Family,Caudovirales
提取列的说明:
- 排序 - 与被提取的唯一名称关联的数字
- 名称 - 具有名称的每一列的唯一名称
- Rank - 从中提取名称的列的标题
- 父级 - 前面单元格中的最后一个非空白值
first_row = ['Realm', 'Subrealm', 'Kingdom', 'Subkingdom', 'Phylum', 'Subphylum', 'Class', 'Subclass', 'Order', 'Suborder', 'Family', 'Subfamily', 'Genus', 'Subgenus', 'Species']
df1 = pd.read_csv('ictv-msl-2018-v1.csv', index_col='Sort')
print('Writing to CSV File')
with open('dp1.csv', 'w', newline="") as tbl_writer1:
c2 = csv.writer(tbl_writer1)
c2.writerow(['Name', 'Rank'])
for list_item in first_row:
for item in df1[list_item].unique():
row_content = [item, list_item]
c2.writerow(row_content)
with open('ictv-msl-2018-v1.csv', 'r',) as inp1, open('dp1.csv', 'r') as inp2, open('dp2.csv', 'w', newline="") as out1:
msl_tbl = csv.DictReader(inp1)
tbl_data = list(msl_tbl)
c3 = csv.writer(out1)
for row in csv.reader(inp2):
if row[0] != 'nan':
c3.writerow(row)
else:
print('Deleted Row:', row)
df2 = pd.read_csv('dp2.csv')
print(tbl_data[0]['Species'])
print(len(df2), len(df2.columns))
for i in range(len(df2)):
for j in range(len(df2.columns)):
print(df2.get_values()[i][j], i, j)
有人可以告诉我如何提取以下所有四个列吗?以上代码仅提取Name,Rank
Sort,Name,Rank,Parent
解决方案
使用您的原始输入数据input.csv
:
import csv
with open('input.csv',newline='') as fin:
r = csv.reader(fin)
header = next(r)
names = {}
for row in r:
parent = None
for idx,content in enumerate(row):
if idx == 0: # First column is sort.
sort = content
continue
if not content: # Skip blanks.
continue
if parent is None: # Record the first non-blank content as parent.
parent = content
continue
# Remaining content is names.
# Record name the first time it is seen,
# then update parent.
data = sort,content,header[idx],parent
if content not in names:
names[content] = data
parent = content
def sortkey(data):
sort,name,rank,parent = data
return header.index(rank),int(sort)
with open('output.csv','w',newline='') as fout:
w = csv.writer(fout)
w.writerow('Sort Name Rank Parent'.split())
for data in sorted(names.values(),key=sortkey):
w.writerow(data)
输出:
Sort,Name,Rank,Parent
1,Negarnaviricota,Phylum,Virus
1,Haploviricotina,Subphylum,Negarnaviricota
4,Polyploviricotina,Subphylum,Negarnaviricota
1,Chunqiuviricetes,Class,Haploviricotina
2,Milneviricetes,Class,Haploviricotina
3,Monjiviricetes,Class,Haploviricotina
4,Ellioviricetes,Class,Polyploviricotina
1,Muvirales,Order,Chunqiuviricetes
2,Serpentovirales,Order,Milneviricetes
3,Jingchuvirales,Order,Monjiviricetes
4,Bunyavirales,Order,Ellioviricetes
5,Caudovirales,Order,Virus
1,Qinviridae,Family,Muvirales
2,Aspiviridae,Family,Serpentovirales
3,Chuviridae,Family,Jingchuvirales
4,Phasmaviridae,Family,Bunyavirales
5,Ackermannviridae,Family,Caudovirales
6,Myoviridae,Family,Caudovirales
7,Siphoviridae,Family,Caudovirales
推荐阅读
- vb.net - VB禁用“|” 钥匙
- spring-boot - Spring Boot 配置缓存表达式
- node.js - 流星mogodb用_id插入一个嵌套的对象数组
- github - github页面无法正常运行
- react-native - 如何在 react-native 中发送`application/zip`
- java - 使用应用程序按钮在 android 上切换分屏
- python - 如何使用boto3的export_image?
- javascript - 在我期待测试中的某些内容之前,如何等待“finally”块运行?
- python-3.x - 绘制聚合日期时间索引,错误轴必须设置“频率”以转换为周期
- javascript - 当某人的数据输入到工作表时,如何在 Google 工作表中自动运行脚本?