python - Merge two data frames on three columns in Python
问题描述
I have two data frames and I would like to merge them on the two columns Latitude and Longitude. The resulting df should include all columns. df1:
Date Latitude Longitude LST
0 2019-01-01 66.33 17.100 -8.010004
1 2019-01-09 66.33 17.100 -6.675005
2 2019-01-17 66.33 17.100 -21.845003
3 2019-01-25 66.33 17.100 -26.940004
4 2019-02-02 66.33 17.100 -23.035009
... ... ... ... ...
and df2:
Station_Number Date Latitude Longitude Elevation Value
0 CA002100636 2019-01-01 69.5667 -138.9167 1.0 -18.300000
1 CA002100636 2019-01-09 69.5667 -138.9167 1.0 -26.871429
2 CA002100636 2019-01-17 69.5667 -138.9167 1.0 -19.885714
3 CA002100636 2019-01-25 69.5667 -138.9167 1.0 -17.737500
4 CA002100636 2019-02-02 69.5667 -138.9167 1.0 -13.787500
... ... ... ... ... ... ...
I have tried: LST_1=pd.merge(df1, df2, how = 'inner')
but using merge in that way I have lost several data points, which are included in both data frames.
解决方案
我不确定您是否要在特定列上合并,如果是这样,您需要选择一个具有重叠标识符的列 - 例如“日期”列。
df_ = pd.merge(df1, df2, on="Date")
print(df_)
Date Latitude_x Longitude_x ... Longitude_y Elevation Value
0 01.01.2019 66.33 17.1 ... -138.9167 1.0 -18.300000
1 09.01.2019 66.33 17.1 ... -138.9167 1.0 -26.871429
2 17.01.2019 66.33 17.1 ... -138.9167 1.0 -19.885714
3 25.01.2019 66.33 17.1 ... -138.9167 1.0 -17.737500
4 02.02.2019 66.33 17.1 ... -138.9167 1.0 -13.787500
[5 rows x 9 columns]
<class 'pandas.core.frame.DataFrame'>
Int64Index: 5 entries, 0 to 4
Data columns (total 9 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 Date 5 non-null object
1 Latitude_x 5 non-null float64
2 Longitude_x 5 non-null float64
3 LST 5 non-null object
4 Station_Number 5 non-null object
5 Latitude_y 5 non-null int64
6 Longitude_y 5 non-null int64
7 Elevation 5 non-null float64
8 Value 5 non-null object
dtypes: float64(3), int64(2), object(4)
memory usage: 400.0+ bytes
由于您有相同的列名,pandas 将在纬度和经度上创建 _x 和 _y。
如果您希望所有列和一行中的数据独立于其他列,则可以使用 pd.concat。但是,由于缺少数据,这将创建一些 NaN 值。
df_1 = pd.concat([df1, df2])
print(df_1)
Date Latitude Longitude ... Station_Number Elevation Value
0 01.01.2019 66.33 17.1 ... NaN NaN NaN
1 09.01.2019 66.33 17.1 ... NaN NaN NaN
2 17.01.2019 66.33 17.1 ... NaN NaN NaN
3 25.01.2019 66.33 17.1 ... NaN NaN NaN
4 02.02.2019 66.33 17.1 ... NaN NaN NaN
0 01.01.2019 69.56 -138.9167 ... CA002100636 1.0 -18.300000
1 09.01.2019 69.56 -138.9167 ... CA002100636 1.0 -26.871429
2 17.01.2019 69.56 -138.9167 ... CA002100636 1.0 -19.885714
3 25.01.2019 69.56 -138.9167 ... CA002100636 1.0 -17.737500
4 02.02.2019 69.56 -138.9167 ... CA002100636 1.0 -13.787500
df_1.info()
<class 'pandas.core.frame.DataFrame'>
Int64Index: 10 entries, 0 to 4
Data columns (total 7 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 Date 10 non-null object
1 Latitude 10 non-null float64
2 Longitude 10 non-null float64
3 LST 5 non-null object
4 Station_Number 5 non-null object
5 Elevation 5 non-null float64
6 Value 5 non-null object
dtypes: float64(3), object(4)
memory usage: 640.0+ bytes
推荐阅读
- python - 更新/合并具有不同列名的 2 个数据文件
- octave - Octave如何限制箱线图宽度
- android - 如何在回收站视图的底部对齐不可见项目的底部?
- python - tz.gettz() 在 Windows 10 中返回 None
- java - 使用范围列表查找给定金额的税值
- redis - 有什么方法可以在主从架构中配置 hazelcast,比如带有 Spring boot 的 redis
- php - sylius 产品图像显示问题,想象打开功能抛出错误
- javascript - 上传后如何显示新的文件上传按钮?
- javascript - 鉴于上传过程是异步的,我如何在将文件上传到 firestore 后获取文件的元数据作为函数的返回值?
- java - 如何在图像视图中加载验证码图像?(热链接验证码)