首页 > 解决方案 > Neo4j社区版并行连接查询优化

问题描述

我正在使用具有数百万个节点和关系的数据评估 Neo4j 社区版本。我编写了一个线程应用程序来并行写入数据。四个线程并行工作,总写入时间减少了 4 倍,但写入 DB 的节点数也减少了 4。

似乎没有发生并行写作。每个线程中运行的数据之间没有依赖关系,也不会引发错误。

它的行为是这样的

  1. 我在没有线程的情况下在 1 小时内编写了 100k 个节点
  2. 我尝试用 4 个线程编写 100k 个节点。它在 15 分钟内完成,但我看到只写入了 25k 个节点。

我曾经thread.join()等待线程完成。我使用了 Python。

更新: 我在我的查询下方添加以供参考


create (session:Session {session_id: 'session_id1'})
;

match (s:Session) where s.session_id='session_id1' with s
create (e1:Event {insert_id: "insert_id1"}) set e1:SeenPage
create (s)-[:CONTAINS]->(e1)
create (s)-[:FIRST_EVENT]->(e1)
merge (pp1:Properties {value: "sample-url-1"}) set pp1:Page merge (e1)-[:RELATED_TO]->(pp1)
merge (pp2:Properties {value: "sp"}) set pp2:Type merge (e1)-[:RELATED_TO]->(pp2)
;


match (e1:SeenPage) where e1.insert_id='insert_id1' with e1
create (e2:Event {insert_id: "insert_id2"}) set e2:Show merge (e1)-[:NEXT]->(e2) with e2
match (s:Session) where s.session_id='session_id1' with s, e2
create (s)-[:CONTAINS]->(e2)
merge (pp1:Properties {value: "occasions"}) set pp1:Category merge (e2)-[:RELATED_TO]->(pp1)
merge (pp2:Properties {value: "sample-url-2"}) set pp2:Page merge (e2)-[:RELATED_TO]->(pp2)
merge (pp3:Properties {value: "pl"}) set pp3:Type merge (e2)-[:RELATED_TO]->(pp3)
merge (pp4:Properties {value: "child category"}) set pp4:Sub_Category merge (e2)-[:RELATED_TO]->(pp4)
;


match (e2:Show) where e2.insert_id='insert_id2' with e2
create (e3:Event {insert_id: "insert_id3"}) set e3:SeenPage merge (e2)-[:NEXT]->(e3) with e3
match (s:Session) where s.session_id='session_id1' with s, e3
create (s)-[:CONTAINS]->(e3)
merge (pp1:Properties {value: "/p-page-0"}) set pp1:Page merge (e3)-[:RELATED_TO]->(pp1)
merge (pp2:Properties {value: "sp"}) set pp2:Type merge (e3)-[:RELATED_TO]->(pp2)
;


match (e3:SeenPage) where e3.insert_id='insert_id3' with e3
create (e4:Event {insert_id: "insert_id4"}) set e4:Show merge (e3)-[:NEXT]->(e4) with e4
match (s:Session) where s.session_id='session_id1' with s, e4
create (s)-[:CONTAINS]->(e4)
merge (pp1:Properties {value: "rect1"}) set pp1:Category merge (e4)-[:RELATED_TO]->(pp1)
merge (pp2:Properties {value: "/p-page-1"}) set pp2:Page merge (e4)-[:RELATED_TO]->(pp2)
merge (pp3:Properties {value: "pl"}) set pp3:Type merge (e4)-[:RELATED_TO]->(pp3)
merge (pp4:Properties {value: "him"}) set pp4:Sub_Category merge (e4)-[:RELATED_TO]->(pp4)
;


match (e4:Show) where e4.insert_id='insert_id4' with e4
create (e5:Event {insert_id: "insert_id5"}) set e5:SeenPage merge (e4)-[:NEXT]->(e5) with e5
match (s:Session) where s.session_id='session_id1' with s, e5
create (s)-[:CONTAINS]->(e5)
merge (pp1:Properties {value: "/p-page-2"}) set pp1:Page merge (e5)-[:RELATED_TO]->(pp1)
merge (pp2:Properties {value: "sp"}) set pp2:Type merge (e5)-[:RELATED_TO]->(pp2)
;

该数据代表用户在网站上的旅程。用户启动会话并浏览页面。用户完成的动作被记录为事件。每个事件都有其唯一的 ID。然后事件的顺序与关系:NEXT和相连CONTAINS。事件不是唯一的,这就是我不得不使用createnot的原因merge。事件的属性是唯一的,它们被创建为节点,然后添加关系RELATED_TO

就像这样

#session contains events 
#events are connected with :next

Session-[:CONTAINS]->(Event1)-[:NEXT]-(Event2)<-[:CONTAINS]-Session

一个会话可以包含 100 个事件。目前的写作速度很慢。它在 4 小时内写入 10k 个会话数据。每个会话平均包含 10 个事件。我正在使用 python 螺栓连接器逐个事件编写数据。

任何帮助将非常感激。

标签: neo4jneo4j-apocneo4j-python-driver

解决方案


推荐阅读