首页 > 解决方案 > Merge two data frames on three columns in Python

问题描述

I have two data frames and I would like to merge them on the two columns Latitude and Longitude. The resulting df should include all columns. df1:

            Date  Latitude  Longitude        LST
0     2019-01-01     66.33     17.100  -8.010004
1     2019-01-09     66.33     17.100  -6.675005
2     2019-01-17     66.33     17.100 -21.845003
3     2019-01-25     66.33     17.100 -26.940004
4     2019-02-02     66.33     17.100 -23.035009
...   ...            ...       ...    ...

and df2:

     Station_Number       Date  Latitude  Longitude  Elevation      Value
0       CA002100636 2019-01-01   69.5667  -138.9167        1.0 -18.300000
1       CA002100636 2019-01-09   69.5667  -138.9167        1.0 -26.871429
2       CA002100636 2019-01-17   69.5667  -138.9167        1.0 -19.885714
3       CA002100636 2019-01-25   69.5667  -138.9167        1.0 -17.737500
4       CA002100636 2019-02-02   69.5667  -138.9167        1.0 -13.787500
...             ...        ...       ...        ...        ...        ...

I have tried: LST_1=pd.merge(df1, df2, how = 'inner') but using merge in that way I have lost several data points, which are included in both data frames.

标签: pythonpandas

解决方案


我不确定您是否要在特定列上合并,如果是这样,您需要选择一个具有重叠标识符的列 - 例如“日期”列。

df_ = pd.merge(df1, df2, on="Date")
print(df_)
     Date  Latitude_x  Longitude_x  ... Longitude_y Elevation        Value
0  01.01.2019       66.33         17.1  ...    -138.9167       1.0  -18.300000
1  09.01.2019       66.33         17.1  ...    -138.9167       1.0  -26.871429
2  17.01.2019       66.33         17.1  ...    -138.9167       1.0  -19.885714
3  25.01.2019       66.33         17.1  ...    -138.9167       1.0  -17.737500
4  02.02.2019       66.33         17.1  ...    -138.9167       1.0  -13.787500

[5 rows x 9 columns]

<class 'pandas.core.frame.DataFrame'>
Int64Index: 5 entries, 0 to 4
Data columns (total 9 columns):
 #   Column          Non-Null Count  Dtype  
---  ------          --------------  -----  
 0   Date            5 non-null      object 
 1   Latitude_x      5 non-null      float64
 2   Longitude_x     5 non-null      float64
 3   LST             5 non-null      object 
 4   Station_Number  5 non-null      object 
 5   Latitude_y      5 non-null      int64  
 6   Longitude_y     5 non-null      int64  
 7   Elevation       5 non-null      float64
 8   Value           5 non-null      object 

dtypes: float64(3), int64(2), object(4)
memory usage: 400.0+ bytes

由于您有相同的列名,pandas 将在纬度和经度上创建 _x 和 _y。

如果您希望所有列和一行中的数据独立于其他列,则可以使用 pd.concat。但是,由于缺少数据,这将创建一些 NaN 值。

df_1 = pd.concat([df1, df2])
print(df_1)
         Date  Latitude  Longitude  ... Station_Number Elevation        Value
0  01.01.2019     66.33       17.1  ...            NaN       NaN          NaN
1  09.01.2019     66.33       17.1  ...            NaN       NaN          NaN
2  17.01.2019     66.33       17.1  ...            NaN       NaN          NaN
3  25.01.2019     66.33       17.1  ...            NaN       NaN          NaN
4  02.02.2019     66.33       17.1  ...            NaN       NaN          NaN
0  01.01.2019     69.56  -138.9167  ...    CA002100636       1.0   -18.300000
1  09.01.2019     69.56  -138.9167  ...    CA002100636       1.0   -26.871429
2  17.01.2019     69.56  -138.9167  ...    CA002100636       1.0   -19.885714
3  25.01.2019     69.56  -138.9167  ...    CA002100636       1.0   -17.737500
4  02.02.2019     69.56  -138.9167  ...    CA002100636       1.0   -13.787500

df_1.info()
<class 'pandas.core.frame.DataFrame'>
Int64Index: 10 entries, 0 to 4
Data columns (total 7 columns):
 #   Column          Non-Null Count  Dtype  
---  ------          --------------  -----  
 0   Date            10 non-null     object 
 1   Latitude        10 non-null     float64
 2   Longitude       10 non-null     float64
 3   LST             5 non-null      object 
 4   Station_Number  5 non-null      object 
 5   Elevation       5 non-null      float64
 6   Value           5 non-null      object 
dtypes: float64(3), object(4)
memory usage: 640.0+ bytes

推荐阅读