首页 > 解决方案 > 合并数据框问题

问题描述

我有 2 个数据框,

第一个是

+------------------------------------------+
|       ID             CustomerType Choice |
+------------------------------------------+
| 0    1.0               Durability    OEM |
| 1    2.0                    Price    OEM |
| 2    3.0               Durability    OEM |
| 3    4.0               Durability    OEM |
| 4    5.0               Durability    OEM |
| 5    6.0  ManufacturerCredibility    OEM |
| 6    7.0                 Warranty    OEM |
| 7    8.0  ManufacturerCredibility    OEM |
| 8    9.0               Durability    OEM |
| 9   10.0                    Price    OEM |
| 10  11.0               Durability    TPN |
| 11  12.0                 Warranty    OEM |
| 12  13.0               Durability    TPN |
+------------------------------------------+

第二个是;

 --------------------------------------------------------+
|        Price  Durability  Warranty  Manufacture   Type |
+--------------------------------------------------------+
| OEM     1.00         4.0       4.0          4.0    OEM |
| TPN     0.80         4.0       1.0          1.0    TPN |
| Reman   0.55         4.0       0.5          1.0  Reman |
| Reuse   0.45         2.5       0.0          1.0  Reuse |
+--------------------------------------------------------+

我需要使用第一个数据帧中的“选择”和第二个数据帧中的“类型”来连接这两个数据帧。

目前我正在使用

data = pd.merge(survey,rel_attr, left_on = 'Choice', right_on = 'Type',how='left')

并有这样尴尬的结果。

+------------------------------------------------------------------------------+
|     Price  Durability  Warranty  Manufacture             CustomerType Choice |
+------------------------------------------------------------------------------+
| 0     1.0         4.0       4.0          4.0               Durability    OEM |
| 1     1.0         4.0       4.0          4.0                    Price    OEM |
| 2     1.0         4.0       4.0          4.0               Durability    OEM |
| 3     1.0         4.0       4.0          4.0               Durability    OEM |
| 4     1.0         4.0       4.0          4.0               Durability    OEM |
| 5     1.0         4.0       4.0          4.0  ManufacturerCredibility    OEM |
| 6     1.0         4.0       4.0          4.0                 Warranty    OEM |
| 7     1.0         4.0       4.0          4.0  ManufacturerCredibility    OEM |
| 8     1.0         4.0       4.0          4.0               Durability    OEM |
| 9     1.0         4.0       4.0          4.0                    Price    OEM |
| 10    1.0         4.0       4.0          4.0                 Warranty    OEM |
| 11    1.0         4.0       4.0          4.0                    Price    OEM |
| 12    1.0         4.0       4.0          4.0                 Warranty    OEM |
| 13    1.0         4.0       4.0          4.0  ManufacturerCredibility    OEM |
+------------------------------------------------------------------------------+

从结果表中我们可以看到所有行都包含来自第二个数据帧的 OEM 数据。我在这里做错了什么?

标签: pythonpandasnumpydataframe

解决方案


我从您的示例数据中得到了一些不同的输出,最后3.rd一行和最后一行正确合并,也left join没有通过合并列进行排序:

data = pd.merge(survey,rel_attr, left_on = 'Choice', right_on = 'Type',how='left')
print (data)
      ID             CustomerType Choice  ...   Warranty  Manufacture  Type
0    1.0               Durability    OEM  ...        4.0          4.0   OEM
1    2.0                    Price    OEM  ...        4.0          4.0   OEM
2    3.0               Durability    OEM  ...        4.0          4.0   OEM
3    4.0               Durability    OEM  ...        4.0          4.0   OEM
4    5.0               Durability    OEM  ...        4.0          4.0   OEM
5    6.0  ManufacturerCredibility    OEM  ...        4.0          4.0   OEM
6    7.0                 Warranty    OEM  ...        4.0          4.0   OEM
7    8.0  ManufacturerCredibility    OEM  ...        4.0          4.0   OEM
8    9.0               Durability    OEM  ...        4.0          4.0   OEM
9   10.0                    Price    OEM  ...        4.0          4.0   OEM
10  11.0               Durability    TPN  ...        1.0          1.0   TPN
11  12.0                 Warranty    OEM  ...        4.0          4.0   OEM
12  13.0               Durability    TPN  ...        1.0          1.0   TPN

[13 rows x 8 columns]

如果排序似乎使用inner join(默认):

data = pd.merge(survey,rel_attr, left_on = 'Choice', right_on = 'Type')
#same as
#data = pd.merge(survey,rel_attr, left_on = 'Choice', right_on = 'Type',how='inner')
print (data)
      ID             CustomerType Choice  ...   Warranty  Manufacture  Type
0    1.0               Durability    OEM  ...        4.0          4.0   OEM
1    2.0                    Price    OEM  ...        4.0          4.0   OEM
2    3.0               Durability    OEM  ...        4.0          4.0   OEM
3    4.0               Durability    OEM  ...        4.0          4.0   OEM
4    5.0               Durability    OEM  ...        4.0          4.0   OEM
5    6.0  ManufacturerCredibility    OEM  ...        4.0          4.0   OEM
6    7.0                 Warranty    OEM  ...        4.0          4.0   OEM
7    8.0  ManufacturerCredibility    OEM  ...        4.0          4.0   OEM
8    9.0               Durability    OEM  ...        4.0          4.0   OEM
9   10.0                    Price    OEM  ...        4.0          4.0   OEM
10  12.0                 Warranty    OEM  ...        4.0          4.0   OEM
11  11.0               Durability    TPN  ...        1.0          1.0   TPN
12  13.0               Durability    TPN  ...        1.0          1.0   TPN

[13 rows x 8 columns]

推荐阅读