首页 > 解决方案 > 根据 Pandas 中的特定列连接不同长度的列

问题描述

我有两个不同的 txt 文件,它们包含相同数量但长度不同的列,即

文件1.txt

1650,A,2428057182945480,0.33446294,0.102967925,-0.3460815
1650,A,2428057232445480,0.086256325,0.719756,0.45393208
1650,A,2428057281945480,-0.04051014,1.1011207,1.0462191
1650,B,2428301534869292,-1.6426647,0.8912665,-4.4452224
1650,B,2428301584369292,-1.6128372,1.1938016,-3.1242943
1650,B,2428301633869292,-3.6656017,1.328025,-1.8204107
1650,B,2428301683369292,-6.0336843,2.2516093,-1.7117537
1650,B,2428301732869292,-3.2778456,-0.43924874,-1.3911091

文件2.txt

1650,A,2428057133445480,-1.2798505,-5.187936,-2.3116016
1650,A,2428057182945480,-3.3029509,-6.8231754,-4.011485
1650,A,2428057232445480,-2.876783,-8.365042,-7.171831
1650,A,2428057281945480,-2.2542906,-8.5661545,-8.153454
1650,A,2428057331445480,-3.2646437,-10.953174,-8.826224
1650,B,2428301485369292,6.3887777,-0.42347443,0.82480246
1650,B,2428301534869292,8.522012,-16.99614,9.446322

可以看出,两个文件在 A 和 B 中的长度不同。我想使用 pandas 将它们连接起来,结果如下:

1650,A,2428057133445480,NaN,NaN,NaN,-1.2798505,-5.187936,-2.3116016
1650,A,2428057182945480,0.33446294,0.102967925,-0.3460815,-3.3029509,-6.8231754,-4.011485
1650,A,2428057232445480,0.086256325,0.719756,0.45393208,-2.876783,-8.365042,-7.171831
1650,A,2428057281945480,-0.04051014,1.1011207,1.0462191,-2.2542906,-8.5661545,-8.153454
1650,A,2428057331445480,NaN,NaN,NaN,-3.2646437,-10.953174,-8.826224
1650,B,2428301485369292,NaN,NaN,NaN,6.3887777,-0.42347443,0.82480246
1650,B,2428301534869292,8.522012,-16.99614,9.446322,-1.6426647,0.8912665,-4.4452224
1650,B,2428301584369292,-1.6128372,1.1938016,-3.1242943,NaN,NaN,NaN
1650,B,2428301633869292,-3.6656017,1.328025,-1.8204107,NaN,NaN,NaN
1650,B,2428301683369292,-6.0336843,2.2516093,-1.7117537,NaN,NaN,NaN
1650,B,2428301732869292,-3.2778456,-0.43924874,-1.3911091,NaN,NaN,NaN

根据我的理解,我可以先生成dataframes然后concatenate像这样使用

df1 = read_data('file1.txt')
df2 = read_data('file2.txt')
pd.concat([df1,df2], ignore_index=True, axis=1)

这个对吗?如果没有,如何解决这个问题?

此外,如何删除行以Nan使结果变为

1650,A,2428057182945480,0.33446294,0.102967925,-0.3460815,-3.3029509,-6.8231754,-4.011485
1650,A,2428057232445480,0.086256325,0.719756,0.45393208,-2.876783,-8.365042,-7.171831
1650,A,2428057281945480,-0.04051014,1.1011207,1.0462191,-2.2542906,-8.5661545,-8.153454
1650,B,2428301534869292,8.522012,-16.99614,9.446322,-1.6426647,0.8912665,-4.4452224

标签: pythonpandasdataframe

解决方案


我不知道列名,所以我只是放置虚拟列名:

df1 = pd.read_csv('untitled.txt') # this is the first txt with columns abcdef
df2 = pd.read_csv('untitled1.txt') # this is the second txt with columns abcghi
 
df1.merge(df2, how='outer', on=['a','b','c']).dropna() # this gives what you want


推荐阅读