python - 根据 Pandas 中的特定列连接不同长度的列
问题描述
我有两个不同的 txt 文件,它们包含相同数量但长度不同的列,即
文件1.txt
1650,A,2428057182945480,0.33446294,0.102967925,-0.3460815
1650,A,2428057232445480,0.086256325,0.719756,0.45393208
1650,A,2428057281945480,-0.04051014,1.1011207,1.0462191
1650,B,2428301534869292,-1.6426647,0.8912665,-4.4452224
1650,B,2428301584369292,-1.6128372,1.1938016,-3.1242943
1650,B,2428301633869292,-3.6656017,1.328025,-1.8204107
1650,B,2428301683369292,-6.0336843,2.2516093,-1.7117537
1650,B,2428301732869292,-3.2778456,-0.43924874,-1.3911091
文件2.txt
1650,A,2428057133445480,-1.2798505,-5.187936,-2.3116016
1650,A,2428057182945480,-3.3029509,-6.8231754,-4.011485
1650,A,2428057232445480,-2.876783,-8.365042,-7.171831
1650,A,2428057281945480,-2.2542906,-8.5661545,-8.153454
1650,A,2428057331445480,-3.2646437,-10.953174,-8.826224
1650,B,2428301485369292,6.3887777,-0.42347443,0.82480246
1650,B,2428301534869292,8.522012,-16.99614,9.446322
可以看出,两个文件在 A 和 B 中的长度不同。我想使用 pandas 将它们连接起来,结果如下:
1650,A,2428057133445480,NaN,NaN,NaN,-1.2798505,-5.187936,-2.3116016
1650,A,2428057182945480,0.33446294,0.102967925,-0.3460815,-3.3029509,-6.8231754,-4.011485
1650,A,2428057232445480,0.086256325,0.719756,0.45393208,-2.876783,-8.365042,-7.171831
1650,A,2428057281945480,-0.04051014,1.1011207,1.0462191,-2.2542906,-8.5661545,-8.153454
1650,A,2428057331445480,NaN,NaN,NaN,-3.2646437,-10.953174,-8.826224
1650,B,2428301485369292,NaN,NaN,NaN,6.3887777,-0.42347443,0.82480246
1650,B,2428301534869292,8.522012,-16.99614,9.446322,-1.6426647,0.8912665,-4.4452224
1650,B,2428301584369292,-1.6128372,1.1938016,-3.1242943,NaN,NaN,NaN
1650,B,2428301633869292,-3.6656017,1.328025,-1.8204107,NaN,NaN,NaN
1650,B,2428301683369292,-6.0336843,2.2516093,-1.7117537,NaN,NaN,NaN
1650,B,2428301732869292,-3.2778456,-0.43924874,-1.3911091,NaN,NaN,NaN
根据我的理解,我可以先生成dataframes
然后concatenate
像这样使用
df1 = read_data('file1.txt')
df2 = read_data('file2.txt')
pd.concat([df1,df2], ignore_index=True, axis=1)
这个对吗?如果没有,如何解决这个问题?
此外,如何删除行以Nan
使结果变为
1650,A,2428057182945480,0.33446294,0.102967925,-0.3460815,-3.3029509,-6.8231754,-4.011485
1650,A,2428057232445480,0.086256325,0.719756,0.45393208,-2.876783,-8.365042,-7.171831
1650,A,2428057281945480,-0.04051014,1.1011207,1.0462191,-2.2542906,-8.5661545,-8.153454
1650,B,2428301534869292,8.522012,-16.99614,9.446322,-1.6426647,0.8912665,-4.4452224
解决方案
我不知道列名,所以我只是放置虚拟列名:
df1 = pd.read_csv('untitled.txt') # this is the first txt with columns abcdef
df2 = pd.read_csv('untitled1.txt') # this is the second txt with columns abcghi
df1.merge(df2, how='outer', on=['a','b','c']).dropna() # this gives what you want
推荐阅读
- javascript - 如何在 React Native 中共享消息或邮件上的 pdf 文件
- layout - 可以创建一个现有布局的对话框,保留类的方法吗?
- python - Django 和 Flask 中 if else 的区别
- python - 如何对 2d 图像执行 Gabor 滤波器?
- javascript - 无法使用 javascript 从 div 中删除类
- java - 在 AMQP 的 Spring 集成中使用 ImmediateRequeueMessageRecoverer?
- data-warehouse - 事实表设计将如何改变以适应多种用途
- python - 在 vim 中使用带有 Syntastic 的多个 Python 检查器的问题
- azure - 有没有办法从 Azure DevOps API 中提取积压项目数据?
- ssl - curl:通过代理访问 TLS 安全的 FTP 服务器