python - 合并数据框问题
问题描述
我有 2 个数据框,
第一个是
+------------------------------------------+
| ID CustomerType Choice |
+------------------------------------------+
| 0 1.0 Durability OEM |
| 1 2.0 Price OEM |
| 2 3.0 Durability OEM |
| 3 4.0 Durability OEM |
| 4 5.0 Durability OEM |
| 5 6.0 ManufacturerCredibility OEM |
| 6 7.0 Warranty OEM |
| 7 8.0 ManufacturerCredibility OEM |
| 8 9.0 Durability OEM |
| 9 10.0 Price OEM |
| 10 11.0 Durability TPN |
| 11 12.0 Warranty OEM |
| 12 13.0 Durability TPN |
+------------------------------------------+
第二个是;
--------------------------------------------------------+
| Price Durability Warranty Manufacture Type |
+--------------------------------------------------------+
| OEM 1.00 4.0 4.0 4.0 OEM |
| TPN 0.80 4.0 1.0 1.0 TPN |
| Reman 0.55 4.0 0.5 1.0 Reman |
| Reuse 0.45 2.5 0.0 1.0 Reuse |
+--------------------------------------------------------+
我需要使用第一个数据帧中的“选择”和第二个数据帧中的“类型”来连接这两个数据帧。
目前我正在使用
data = pd.merge(survey,rel_attr, left_on = 'Choice', right_on = 'Type',how='left')
并有这样尴尬的结果。
+------------------------------------------------------------------------------+
| Price Durability Warranty Manufacture CustomerType Choice |
+------------------------------------------------------------------------------+
| 0 1.0 4.0 4.0 4.0 Durability OEM |
| 1 1.0 4.0 4.0 4.0 Price OEM |
| 2 1.0 4.0 4.0 4.0 Durability OEM |
| 3 1.0 4.0 4.0 4.0 Durability OEM |
| 4 1.0 4.0 4.0 4.0 Durability OEM |
| 5 1.0 4.0 4.0 4.0 ManufacturerCredibility OEM |
| 6 1.0 4.0 4.0 4.0 Warranty OEM |
| 7 1.0 4.0 4.0 4.0 ManufacturerCredibility OEM |
| 8 1.0 4.0 4.0 4.0 Durability OEM |
| 9 1.0 4.0 4.0 4.0 Price OEM |
| 10 1.0 4.0 4.0 4.0 Warranty OEM |
| 11 1.0 4.0 4.0 4.0 Price OEM |
| 12 1.0 4.0 4.0 4.0 Warranty OEM |
| 13 1.0 4.0 4.0 4.0 ManufacturerCredibility OEM |
+------------------------------------------------------------------------------+
从结果表中我们可以看到所有行都包含来自第二个数据帧的 OEM 数据。我在这里做错了什么?
解决方案
我从您的示例数据中得到了一些不同的输出,最后3.rd
一行和最后一行正确合并,也left join
没有通过合并列进行排序:
data = pd.merge(survey,rel_attr, left_on = 'Choice', right_on = 'Type',how='left')
print (data)
ID CustomerType Choice ... Warranty Manufacture Type
0 1.0 Durability OEM ... 4.0 4.0 OEM
1 2.0 Price OEM ... 4.0 4.0 OEM
2 3.0 Durability OEM ... 4.0 4.0 OEM
3 4.0 Durability OEM ... 4.0 4.0 OEM
4 5.0 Durability OEM ... 4.0 4.0 OEM
5 6.0 ManufacturerCredibility OEM ... 4.0 4.0 OEM
6 7.0 Warranty OEM ... 4.0 4.0 OEM
7 8.0 ManufacturerCredibility OEM ... 4.0 4.0 OEM
8 9.0 Durability OEM ... 4.0 4.0 OEM
9 10.0 Price OEM ... 4.0 4.0 OEM
10 11.0 Durability TPN ... 1.0 1.0 TPN
11 12.0 Warranty OEM ... 4.0 4.0 OEM
12 13.0 Durability TPN ... 1.0 1.0 TPN
[13 rows x 8 columns]
如果排序似乎使用inner join
(默认):
data = pd.merge(survey,rel_attr, left_on = 'Choice', right_on = 'Type')
#same as
#data = pd.merge(survey,rel_attr, left_on = 'Choice', right_on = 'Type',how='inner')
print (data)
ID CustomerType Choice ... Warranty Manufacture Type
0 1.0 Durability OEM ... 4.0 4.0 OEM
1 2.0 Price OEM ... 4.0 4.0 OEM
2 3.0 Durability OEM ... 4.0 4.0 OEM
3 4.0 Durability OEM ... 4.0 4.0 OEM
4 5.0 Durability OEM ... 4.0 4.0 OEM
5 6.0 ManufacturerCredibility OEM ... 4.0 4.0 OEM
6 7.0 Warranty OEM ... 4.0 4.0 OEM
7 8.0 ManufacturerCredibility OEM ... 4.0 4.0 OEM
8 9.0 Durability OEM ... 4.0 4.0 OEM
9 10.0 Price OEM ... 4.0 4.0 OEM
10 12.0 Warranty OEM ... 4.0 4.0 OEM
11 11.0 Durability TPN ... 1.0 1.0 TPN
12 13.0 Durability TPN ... 1.0 1.0 TPN
[13 rows x 8 columns]
推荐阅读
- c++ - 我可以使用 lambda 来简化 for 循环吗
- vector - 可以从迭代器中收集 &mut 吗?
- asp.net-mvc - IIS URL 重写匹配最佳实践
- javascript - 在 Bootstrap 3 中使用 tagsinput 和 typeahead
- php - MySQLi 错误 - 两次相同的输出
- arrays - 创建行数组VBA
- azure - Azure 数据工厂 ARM 模板
- python-3.x - python-snappy 循环流式传输数据到客户端
- ios - 在 iOS 订阅中,verifyReceipt 每天/每小时/等的调用频率是否有限制?
- amazon-web-services - 为什么某些 AWS IAM 操作需要所有资源