csv - Is there a way to speed up LOAD CSV of 120M relationships into 10M nodes avoiding cartesian product in Neo4j?
问题描述
I am trying to create 120M relationships between 10M nodes(:Homes). I have already created all the (:Homes) nodes and created an index on (:Homes).id.
CREATE INDEX ON :Homes(id)
This is my code for inserting into the database from a local CSV file. Each row in the CSV file has home1_id
and home2_id
and I am trying to create a relationship home1 --> home2
USING PERIODIC COMMIT 50000
LOAD CSV WITH HEADERS FROM "file:///relationships.csv" AS row
MATCH (home1:Homes {id: toInteger(row.home1_id)}),(home2:Homes {id: toInteger(row.home2_id)})
CREATE (home1)-[:Recommends]->(home2)
Running this currently seems like it is going to take 1-2 hours. Are there any optimizations I can make?
解决方案
推荐阅读
- mysql - 如何优化从同一个表中选择两次的 SQL 查询
- python - 如何旋转熊猫数据框(带有两个标题)
- spring-boot - 如何提供嵌入式数据库供 Mockito 使用?
- git - Gitlab ci 多触发流水线动态项目名称
- reactjs - Kubernetes 机密传递到 React App 容器但未被使用
- php - 为什么有时用户的会话似乎消失了?
- postgresql - 如何验证 Postgre 中新创建的备份是否健康?
- java - 输出替换数组 | 一分钱投球游戏
- postgresql - Flask 应用程序在 Heroku 上冻结,但在 localhost 上运行良好
- operating-system - 分析项目需求并建议应使用哪个操作系统来开发应用程序