python - 是否可以根据单元格的前五个字符匹配数据帧的行?
问题描述
如果有人有这个数据框:
0 1 2
0 RC100_1 RC100_1 RC100_1
1 RC101_1 RC101_1 RC101_1
2 RC101_3 RC102_1 RC102_1
3 RC102_1 RC103_3 RC103_3
4 RC102_3 RC104_1 RC104_1
5 RC103_3 RC109_2 RC109_2
6 RC104_1 RC111_1 RC111_1
7 RC109_2 RC114_2 RC114_2
8 RC111_1 RC115_2 RC115_2
9 RC114_1 RC116_1 RC116_1
10 RC115_4 RC116_2 RC116_2
11 RC116_1 RC117_2 RC117_2
12 RC117_4 RC117_4 RC117_4
13 RC117_4 RC118_2 RC118_2
14 RC118_1 RC119_2 RC119_2
15 RC119_4 RC120_2 RC120_2
16 RC120_4 RC121_2 RC121_2
17 RC121_4 RC122_2 RC122_2
18 RC122_4 RC125_2 RC125_2
19 RC125_2 RC126_3 RC126_3
20 RC126_3 RC129_2 RC129_2
21 RC129_4 RC12_24 RC12_24
是否可以对其进行转换,使每行仅包含前五个字符匹配的内容,否则将对其进行排序?我的意思是不手动更改每个单元格。这是我想将其转换为
0 1 2
0 RC100_1 RC100_1 RC100_1
1 RC101_1 RC101_1 RC101_1
2 RC101_3 NaN NaN
3 RC102_1 RC102_1 RC102_1
4 RC102_3 NaN NaN
5 RC103_3 RC103_3 RC103_3
6 RC104_1 RC104_1 RC104_1
7 RC109_2 RC109_2 RC109_2
8 RC111_1 RC111_1 RC111_1
9 RC114_1 RC114_2 RC114_2
10 RC115_4 RC115_2 RC115_2
11 RC116_1 RC116_1 RC116_1
12 NaN RC116_2 RC116_2
13 RC117_4 RC117_2 RC117_2
14 RC117_4 RC117_4 RC117_4
15 RC118_1 RC118_2 RC118_2
16 RC119_4 RC119_2 RC119_2
17 RC120_4 RC120_2 RC120_2
18 RC121_4 RC121_2 RC121_2
19 RC122_4 RC122_2 RC122_2
20 RC125_2 RC125_2 RC125_2
21 RC126_3 RC126_3 RC126_3
22 RC129_4 RC129_2 RC129_2
23 NaN RC12_24 RC12_24
解决方案
只需从您拥有的所有文件名中创建一个集合,然后使用它来索引和对齐所有文件名。这是您发布的数据的工作示例。
import pandas as pd
import numpy as np
# in your case do something like names_idx = excel_names + csv_names + txt_names
names_idx = ['RC100_1', 'RC100_1', 'RC100_1',
'RC101_1', 'RC101_1', 'RC101_1',
'RC101_3', 'RC102_1', 'RC102_1',
'RC102_1', 'RC103_3', 'RC103_3',
'RC102_3', 'RC104_1', 'RC104_1',
'RC103_3', 'RC109_2', 'RC109_2',
'RC104_1', 'RC111_1', 'RC111_1',
'RC109_2', 'RC114_2', 'RC114_2',
'RC111_1', 'RC115_2', 'RC115_2',
'RC114_1', 'RC116_1', 'RC116_1',
'RC115_4', 'RC116_2', 'RC116_2',
'RC116_1', 'RC117_2', 'RC117_2',
'RC117_4', 'RC117_4', 'RC117_4',
'RC117_4', 'RC118_2', 'RC118_2',
'RC118_1', 'RC119_2', 'RC119_2',
'RC119_4', 'RC120_2', 'RC120_2',
'RC120_4', 'RC121_2', 'RC121_2',
'RC121_4', 'RC122_2', 'RC122_2',
'RC122_4', 'RC125_2', 'RC125_2',
'RC125_2', 'RC126_3', 'RC126_3',
'RC126_3', 'RC129_2', 'RC129_2',
'RC129_4', 'RC12_24', 'RC12_24']
# This is not needed if you already have them separately from globbing earlier
csv_names = names_idx[::3]
excel_names = names_idx[1::3]
txt_names = names_idx[2::3]
# remove duplicates
names_idx = set(names_idx)
# create an empty dataframe with index as unique file names
df = pd.DataFrame(index=names_idx)
# create empty columns and position the file names
df['csv'] = np.nan
df.csv[csv_names] = csv_names
df['excel'] = np.nan
df.excel[excel_names] = excel_names
df['txt'] = np.nan
df.txt[txt_names] = txt_names
print(df)
输出
csv excel txt
RC111_1 RC111_1 RC111_1 RC111_1
RC125_2 RC125_2 RC125_2 RC125_2
RC114_2 NaN RC114_2 RC114_2
RC116_2 NaN RC116_2 RC116_2
RC129_4 RC129_4 NaN NaN
RC118_1 RC118_1 NaN NaN
RC12_24 NaN RC12_24 RC12_24
RC121_4 RC121_4 NaN NaN
RC126_3 RC126_3 RC126_3 RC126_3
RC129_2 NaN RC129_2 RC129_2
RC102_1 RC102_1 RC102_1 RC102_1
RC101_1 RC101_1 RC101_1 RC101_1
RC109_2 RC109_2 RC109_2 RC109_2
RC119_4 RC119_4 NaN NaN
RC100_1 RC100_1 RC100_1 RC100_1
RC120_2 NaN RC120_2 RC120_2
RC122_2 NaN RC122_2 RC122_2
RC121_2 NaN RC121_2 RC121_2
RC117_4 RC117_4 RC117_4 RC117_4
RC118_2 NaN RC118_2 RC118_2
RC103_3 RC103_3 RC103_3 RC103_3
RC117_2 NaN RC117_2 RC117_2
RC102_3 RC102_3 NaN NaN
RC119_2 NaN RC119_2 RC119_2
RC114_1 RC114_1 NaN NaN
RC116_1 RC116_1 RC116_1 RC116_1
RC101_3 RC101_3 NaN NaN
RC115_2 NaN RC115_2 RC115_2
RC122_4 RC122_4 NaN NaN
RC104_1 RC104_1 RC104_1 RC104_1
RC115_4 RC115_4 NaN NaN
RC120_4 RC120_4 NaN NaN
推荐阅读
- php - Facebook 登录消息:“URL 被阻止:此重定向失败,因为重定向 URI 未在应用程序的客户端 OAuth 设置中列入白名单。”
- c++ - 循环包含错误,模板源文件
- java - Java parallelStream 映射未命中记录
- webpack - NativeScript - Webpack 环境变量不起作用
- haskell - 无法将类型同义词与 Either 匹配
- amazon-web-services - AWS EKS 升级到 kubernetes v1.11.5 后出现 x509 错误
- azure - 是否可以使用队列名称模式或多个队列创建队列触发的 Azure 函数?
- javascript - Twitter Typeahead 自定义选择逻辑
- javascript - 使用带有键值对的给定对象在javascript中构造一个带有键值对的数组
- javascript - 带有数据列表的VueJS组件不显示选项