首页 > 解决方案 > python合并和不匹配的记录也需要存在

问题描述

我有两个文件

输入.csv

11/13/2020 07:41:09 TREE count1: id1 green001
11/13/2020 07:43:09 TREE count1: id1 black001
11/13/2020 07:45:09 TREE count1: id2 black001
11/13/2020 07:45:09 PLAN count1: id3 green002
11/13/2020 07:45:09 PLAN count1: id4 green004

查找.csv

ID,item,message
id1,item1,message 1
id2,item2,message 2
id3,item3,message 3

我正在尝试合并这两个文件并期望低于输出预期输出:

Time,Type,counts,id,item,message,colour
11/13/2020 07:41:09,TREE,count1,id1,item1,message 1,green001
11/13/2020 07:43:09,TREE,count1,id1,item1,message 1,black001
11/13/2020 07:45:09,TREE,count1,id2,item2,message 2,black001
11/13/2020 07:45:09,PLAN,count1,id3,item3,message 3,green002
11/13/2020 07:45:19,PLAN,count1,id4,     ,         ,green004

当查找文件中存在 ID 值时,我能够实现合并。代码:

import pandas as pd

# read input and remove spurious : at end of count
input = pd.read_csv("input.csv", sep=' ',
         names=["date","time", "tree","count","ID", "info"])
input["count"] = input["count"].apply(lambda s:s[:-1])

# read lookup and merge
lookup = pd.read_csv("lookup.csv")
merged = input.merge(lookup, on="ID")

# collapse time and date to single column
merged["time"] = merged["date"] + " " + merged["time"]
del merged["date"]

# output
print(merged)
merged.to_csv("testme.csv", index=False)

如果 input.csv 中的所有 ID 值都存在于 lookup.csv 文件中,则代码工作正常,但当 ID 值不存在于 lookup.csv 文件中时代码会失败

任何建议都会有所帮助。

标签: pythonpandaspython-2.7dataframemerge

解决方案


尝试将合并的“方式”输入从“内部”更改为“左侧”或“外部”。默认值为“内部”,这只会导致 ID 在两个 DataFrame 中的合并。您还可以设置指示标志,告诉您每个 DataFrame 中有哪些记录。

merged = input.merge(lookup, on="ID", how='left')

merged = input.merge(lookup, on="ID", how='outer', indicator=True)

推荐阅读