neo4j - Neo4j 数据整理 - 删除重复项
问题描述
我真的很喜欢 Neo4j,但我被这个小问题困住了。我的图表有 Movie 节点、Actor 节点、Year 节点和 Company 节点。一些电影节点因意外拼写错误而重复。例如“海洋十一”和“海洋十一”(注意撇号)。
所以我的问题是,如何将所有缺失的关系从“海洋十一”复制到“海洋十一”,然后删除“海洋十一”?
我需要一个脚本来逐案运行。不用担心在所有带有撇号的电影中都应用这个,因为拼写事故是不同的。
--
使用示例数据更新
MATCH (n) RETURN n LIMIT 25
回报:
{"name":"Oceans Eleven","artwork":"oceans-eleven.jpg"}
{"name":"Brad Pitt","image":"brad-pitt.jpg"}
{"name":"George Clooney","image":"george-clooney.jpg"}
{"name":"Ocean's Eleven","artwork":"oceans-11.jpg"}
{"name":"2001"}
{"name":"Warner Brothers"}
{"name":"Matt Damon","image":"matt-damon.jpg"}
{"name":"Julia Roberts","image":"julia-roberts.jpg"}
{"name":"Netflix"}
{"name":"Andy Garcia","image":"andy-garcia.jpg"}
解决方案
MATCH (duplicateNode:Movie{name:"Oceans Eleven"})-[r]->(destNode)
WITH destNode , r
OPTIONAL MATCH (trueNode:Movie{name:"Ocean's Eleven"})-[r1]->(destNode)
WHERE r1<>r
WITH r1,r,destNode,trueNode
WITH r,trueNode,destNode, CASE r1 WHEN NULL THEN [] ELSE [1] END AS lst
UNWIND lst as x
CALL apoc.create.relationship(trueNode,type(r),properties(r), destNode)
YIELD from,to
RETURN from,to LIMIT 100
上面的查询将创建所有的外部关系。
MATCH (duplicateNode:Movie{name:"Oceans Eleven"})<-[r]-(destNode)
WITH destNode , r
OPTIONAL MATCH (trueNode:Movie{name:"Ocean's Eleven"})<-[r1]-(destNode)
WHERE r1<>r
WITH r1,r,destNode,trueNode
WITH r,trueNode,destNode, CASE r1 WHEN NULL THEN [1] ELSE [] END AS lst
UNWIND lst as x
CALL apoc.create.relationship(destNode,type(r),properties(r), trueNode)
YIELD from,to
RETURN from,to LIMIT 100
这将做内在关系..在为每个错误运行上述两个之后。您可以执行以下操作以删除所有不正确的节点
MATCH (movie:Movie) WHERE movie.name IN ["Oceans Eleven","..",".."]
DETACH DELETE movie
如果重复节点关系与原始节点的关系完全不同,则可以忽略上述脚本中的以下部分,
OPTIONAL MATCH (trueNode:Movie{name:"Ocean's Eleven"})-[r1]->(destNode)
WHERE r1<>r
WITH r1,r,destNode,trueNode
WITH r,trueNode,destNode, CASE r1 WHEN NULL THEN [] ELSE [1] END AS lst
UNWIND lst as x
并且只匹配 trueNode
MATCH (trueNode:Movie{name:"Ocean's Eleven"})