首页 > 解决方案 > Neo4j 数据整理 - 删除重复项

问题描述

我真的很喜欢 Neo4j,但我被这个小问题困住了。我的图表有 Movie 节点、Actor 节点、Year 节点和 Company 节点。一些电影节点因意外拼写错误而重复。例如“海洋十一”和“海洋十一”(注意撇号)。

在此处输入图像描述

所以我的问题是,如何将所有缺失的关系从“海洋十一”复制到“海洋十一”,然后删除“海洋十一”?

我需要一个脚本来逐案运行。不用担心在所有带有撇号的电影中都应用这个,因为拼写事故是不同的。

--

使用示例数据更新

MATCH (n) RETURN n LIMIT 25

回报:

{"name":"Oceans Eleven","artwork":"oceans-eleven.jpg"}
{"name":"Brad Pitt","image":"brad-pitt.jpg"}
{"name":"George Clooney","image":"george-clooney.jpg"}
{"name":"Ocean's Eleven","artwork":"oceans-11.jpg"}
{"name":"2001"}
{"name":"Warner Brothers"}
{"name":"Matt Damon","image":"matt-damon.jpg"}
{"name":"Julia Roberts","image":"julia-roberts.jpg"}
{"name":"Netflix"}
{"name":"Andy Garcia","image":"andy-garcia.jpg"}

标签: neo4jduplicates

解决方案


MATCH (duplicateNode:Movie{name:"Oceans Eleven"})-[r]->(destNode)
WITH destNode , r  
OPTIONAL MATCH (trueNode:Movie{name:"Ocean's Eleven"})-[r1]->(destNode)
WHERE r1<>r 
WITH r1,r,destNode,trueNode
WITH r,trueNode,destNode, CASE r1 WHEN NULL THEN [] ELSE [1] END AS lst 
UNWIND lst as x
CALL apoc.create.relationship(trueNode,type(r),properties(r), destNode)
YIELD from,to 
RETURN from,to LIMIT 100

上面的查询将创建所有的外部关系。

MATCH (duplicateNode:Movie{name:"Oceans Eleven"})<-[r]-(destNode)
WITH destNode , r  
OPTIONAL MATCH (trueNode:Movie{name:"Ocean's Eleven"})<-[r1]-(destNode)
WHERE r1<>r 
WITH r1,r,destNode,trueNode
WITH r,trueNode,destNode, CASE r1 WHEN NULL THEN [1] ELSE [] END AS lst 
UNWIND lst as x
CALL apoc.create.relationship(destNode,type(r),properties(r), trueNode)
YIELD from,to 
RETURN from,to LIMIT 100

这将做内在关系..在为每个错误运行上述两个之后。您可以执行以下操作以删除所有不正确的节点

MATCH (movie:Movie) WHERE movie.name IN ["Oceans Eleven","..",".."] 
DETACH DELETE movie

如果重复节点关系与原始节点的关系完全不同,则可以忽略上述脚本中的以下部分,

OPTIONAL MATCH (trueNode:Movie{name:"Ocean's Eleven"})-[r1]->(destNode)
WHERE r1<>r 
WITH r1,r,destNode,trueNode
WITH r,trueNode,destNode, CASE r1 WHEN NULL THEN [] ELSE [1] END AS lst 
UNWIND lst as x

并且只匹配 trueNode

MATCH (trueNode:Movie{name:"Ocean's Eleven"})

推荐阅读