首页 > 解决方案 > 熊猫左加入返回更大的矩阵并且不工作

问题描述

我有 2 个数据帧,第一个在“station_anal”下面

        count   Start station number
index       
31623   17105   31623
31258   11432   31258
31201   10194   31201
31200   9505    31200
31247   9145    31247

第二个数据帧“vt”是:

    Start station number    Start station
0   31214                   17th & Corcoran St NW
1   31104                   Adams Mill & Columbia Rd NW
2   31221                   18th & M St NW
3   31111                   10th & U St NW
4   31260                    23rd & E St NW

station_anal 尺寸为 486x2

vt 大小为 8000x2

我的左连接命令是:

lj = pd.merge(station_anal, vt, how = 'left', on = 'Start station number')

两列的 dtypes 相同,即 int64

但是 lj 返回:

lj.head()

count   Start station number    Start station
0   17105   31623   Columbus Circle / Union Station
1   17105   31623   Columbus Circle / Union Station
2   17105   31623   Columbus Circle / Union Station
3   17105   31623   Columbus Circle / Union Station
4   17105   31623   Columbus Circle / Union Station

大小 8000x3

没有意义,因为我的理解是左连接结果矩阵行大小在这种情况下始终是第一个数据帧 486

标签: pythonpandas

解决方案


让我们使用地图:

station_anal['起始站'] = station_anal['起始站号']
.map(vt.set_index('起始站号')['起始站'])

更新删除重复然后映射:

mapper = vt.drop_duplicates('Start Station Number')\
           .set_index('Start station number')['Start station']

station_anal['Start Station'] = station_anal['Start station number']\
                                     .map(mapper)

推荐阅读