python - 熊猫在 customerEmail 列上的 MERGE 重复

目的是从该数据集中检测欺诈行为。

我有两个数据框，其列如下：

DF1[customerEmail, customerphone, customerdevice,customeripadd,NoOftransactions,Fraud] 等 (168,11)

DF2[customerEmail,transactionid, payment methods,orderstatus] 等 (623,11)

customerEmail 列在两个数据框中都很常见，因此合并 customerEmail 上的表是有意义的。

问题是我在 DF2 中重复了 customerEmail，而在 DF1 中没有参考。所以当我合并使用：

: DF3 = pd.merge(DF1, DF2, on='customerEmail')

行和列的总大小为 (819,18)，重复的电子邮件 ID 具有误导性数据。

我希望它使用来自 DF1 的 customerEmail 进行匹配，因此我的最终数据帧 DF3 应该等于 DF1。

标签： pythonpandasmergedata-science

尝试将 how 参数更改为“left”。

例如：

DF3 = DF1.merge(DF2, how='left', on='customerEmail')

如果做不到这一点，我们可能需要更多信息。