首页 > 解决方案 > 在 Neo4j 密码中聚合所有集合中的相关节点

问题描述

我最近开始使用 Neo4j/cypher 并且已经能够成功地构建我想到的大多数基本查询,但是这个解决方案让我无法理解。

节点有一个非常简单的关系模型:书籍被分组到类别中

这些书籍将是独一无二的,并且可以与多个类别相关。

我的基本查询收集了类别,从而产生了一组带有相关类别的书籍:

match (c:Category)-[:contains]-(b:Book)
return b as book, collect(distinct c) as categories

然后我可以收集书籍,从而产生一组相关的书籍和类别:

match (c:Category)-[:contains]-(b:Book)
with b, collect(distinct c) as categories
return collect(distinct b) as books, categories

这似乎朝着正确的方向发展,但整个过程中有许多重复的书籍和类别。这是一个伪示例:

Books                         Categories
-----------------------------------------------
[Easy Home Updates]           [Home and Garden]
-----------------------------------------------
[Gardening Today,             [Outdoors,
 Gardening for Kids,           Hobbies,
 Green Thumb Made Easy]        Gardening]
-----------------------------------------------
[Conversational Spanish,      [Spanish,
 Spanish for Travelers,        Travel,
 Advanced Spanish]             Language]
-----------------------------------------------
[Gardening Today,             [Gardening,
 Gardening for Kids]           Kids]
-----------------------------------------------
[Home Improvement,            [Home Improvement,
 Easy Home Updates,            Home and Garden,
 Family Home Projects]         Family]
-----------------------------------------------
[Gardening Today]             [Gardening]
-----------------------------------------------
[Conversational Spanish,      [Language,
 Advanced Spanish]             Spanish]

我似乎无法找到一种方法来聚合重复项,无论是在初始匹配中使用过滤还是 reduce 和 apoc 函数。

期望的结果是减少书籍和类别的收藏。像这样的东西:

Books                         Categories
----------------------------------------------
[Gardening Today,             [Gardening,
 Gardening for Kids,           Outdoors,
 Green Thumb Made Easy]        Hobbies,
                               Kids,
                               Family]
----------------------------------------------
[Conversational Spanish,      [Spanish,
 Spanish for Travelers,        Language,
 Advanced Spanish]             Travel,
                               Education]
----------------------------------------------
[Home Improvement,            [Home and Garden,
 Easy Home Updates,            Home Improvement,
 Family Home Projects]         Construction]

或者也许我的方法完全关闭了,并且有一种更好、更有效的方法来对相关节点进行分组。

非常感谢您提供的任何帮助以将我指向正确的方向。如果您需要任何进一步的说明,请告诉我。

标签: neo4jcypher

解决方案


创建模型

为了方便可能的进一步答案和解决方案,我记下了我的图表创建声明:

CREATE
  (categoryHome:Category {name: 'Home and Garden'}),
  (categoryOutdoor:Category {name: 'Outdoors'}),
  (categoryHobby:Category {name: 'Hobbies'}),
  (categoryGarden:Category {name: 'Gardening'}),
  (categorySpanish:Category {name: 'Spanish'}),
  (categoryTravel:Category {name: 'Travel'}),
  (categoryLanguage:Category {name: 'Language'}),
  (categoryKids:Category {name: 'Kids'}),
  (categoryImprovement:Category {name: 'Home Improvement'}),
  (categoryFamily:Category {name: 'Family'}),
  (book1:Book {name: 'Easy Home Updates'}),
  (book2:Book {name: 'Gardening Today'}),
  (book3:Book {name: 'Gardening for Kids'}),
  (book4:Book {name: 'Green Thumb Made Easy'}),
  (book5:Book {name: 'Conversational Spanish'}),
  (book6:Book {name: 'Spanish for Travelers'}),
  (book7:Book {name: 'Advanced Spanish'}),
  (book8:Book {name: 'Home Improvement'}),
  (book9:Book {name: 'Easy Home Updates'}),
  (book10:Book {name: 'Family Home Projects'}),
  (categoryHome)-[:CONTAINS]->(book1),
  (categoryHome)-[:CONTAINS]->(book8),
  (categoryHome)-[:CONTAINS]->(book9),
  (categoryHome)-[:CONTAINS]->(book10),
  (categoryOutdoor)-[:CONTAINS]->(book2),
  (categoryOutdoor)-[:CONTAINS]->(book3),
  (categoryOutdoor)-[:CONTAINS]->(book4),
  (categoryHobby)-[:CONTAINS]->(book2),
  (categoryHobby)-[:CONTAINS]->(book3),
  (categoryHobby)-[:CONTAINS]->(book4),
  (categoryGarden)-[:CONTAINS]->(book2),
  (categoryGarden)-[:CONTAINS]->(book3),
  (categoryGarden)-[:CONTAINS]->(book4),
  (categorySpanish)-[:CONTAINS]->(book5),
  (categorySpanish)-[:CONTAINS]->(book6),
  (categorySpanish)-[:CONTAINS]->(book7),
  (categoryTravel)-[:CONTAINS]->(book5),
  (categoryTravel)-[:CONTAINS]->(book6),
  (categoryTravel)-[:CONTAINS]->(book7),
  (categoryLanguage)-[:CONTAINS]->(book5),
  (categoryLanguage)-[:CONTAINS]->(book6),
  (categoryLanguage)-[:CONTAINS]->(book7),
  (categoryKids)-[:CONTAINS]->(book2),
  (categoryKids)-[:CONTAINS]->(book3),
  (categoryImprovement)-[:CONTAINS]->(book8),
  (categoryImprovement)-[:CONTAINS]->(book9),
  (categoryImprovement)-[:CONTAINS]->(book10),
  (categoryFamily)-[:CONTAINS]->(book8),
  (categoryFamily)-[:CONTAINS]->(book9),
  (categoryFamily)-[:CONTAINS]->(book10);

解释

在我看来,你的技术实现是对的,但从专业角度来看,你的要求并不一致。让我们选择一个例子。您期望以下记录:

BOOKS:                        CATEGORIES:
Gardening Today,              Gardening,
Gardening for Kids,           Outdoors,
Green Thumb Made Easy         Hobbies,
                              Kids,
                              Family

通过执行以下 Cypher 查询,该Family条目不是 book 的有效类别Gardening Today

MATCH (book:Book {name: 'Gardening Today'})<-[:CONTAINS]-(category:Category)
RETURN DISTINCT book.name, collect(category.name);

╒═════════════════╤═════════════════════════════════════════╕
│"book.name"      │"collect(category.name)"                 │
╞═════════════════╪═════════════════════════════════════════╡
│"Gardening Today"│["Kids","Gardening","Hobbies","Outdoors"]│
└─────────────────┴─────────────────────────────────────────┘

进行交叉检查确认类别Family完全包含其他书籍。

MATCH (category:Category {name: 'Family'})-[:CONTAINS]->(book:Book)
RETURN DISTINCT category.name, collect(book.name);
╒═══════════════╤═══════════════════════════════════════════════════════════════╕
│"category.name"│"collect(book.name)"                                           │
╞═══════════════╪═══════════════════════════════════════════════════════════════╡
│"Family"       │["Family Home Projects","Easy Home Updates","Home Improvement"]│
└───────────────┴───────────────────────────────────────────────────────────────┘

这个过程继续传播。这就是您按预期获得不同切片结果集的原因。所以你已经实现的方法是正确的:

MATCH path = (category:Category)-[:CONTAINS]->(book:Book)
WITH collect(category.name) AS categoryGroup, book.name AS bookName
RETURN categoryGroup, collect(bookName);

╒═════════════════════════════════════════════════════════════════╤═════════════════════════════════════════════════════════════════════╕
│"categoryGroup"                                                  │"collect(bookName)"                                                  │
╞═════════════════════════════════════════════════════════════════╪═════════════════════════════════════════════════════════════════════╡
│["Spanish","Travel","Language"]                                  │["Spanish for Travelers","Advanced Spanish","Conversational Spanish"]│
├─────────────────────────────────────────────────────────────────┼─────────────────────────────────────────────────────────────────────┤
│["Home Improvement","Family","Home and Garden","Home and Garden"]│["Easy Home Updates"]                                                │
├─────────────────────────────────────────────────────────────────┼─────────────────────────────────────────────────────────────────────┤
│["Hobbies","Gardening","Kids","Outdoors"]                        │["Gardening Today","Gardening for Kids"]                             │
├─────────────────────────────────────────────────────────────────┼─────────────────────────────────────────────────────────────────────┤
│["Hobbies","Gardening","Outdoors"]                               │["Green Thumb Made Easy"]                                            │
├─────────────────────────────────────────────────────────────────┼─────────────────────────────────────────────────────────────────────┤
│["Home Improvement","Family","Home and Garden"]                  │["Home Improvement","Family Home Projects"]                          │
└─────────────────────────────────────────────────────────────────┴─────────────────────────────────────────────────────────────────────┘

扩大

基本思想

因为请求的映射违反了赋值规则(集合论),我们不能使用通常的模式匹配。相反,我们可以通过一个技巧来实现我们的目标,方法是找到给定书籍的所有连接节点并在事后准备它们。

请确保您已安装 Neo4j APOC 库

图形

解决方案

MATCH (selectedBook:Book)
  WHERE selectedBook.name = 'Gardening for Kids'
CALL apoc.path.subgraphNodes(selectedBook, {uniqueness: 'NODE_GLOBAL'}) YIELD node
WITH collect(DISTINCT node) AS subgraphNodes
WITH
  filter (node IN subgraphNodes
    WHERE node:Category) AS categories,
  filter (node IN subgraphNodes
    WHERE node:Book) AS books
WITH categories, books
UNWIND categories AS category
UNWIND books AS book
RETURN collect(DISTINCT category.name) AS categoryNames, collect(DISTINCT book.name) AS bookNames;

解释

  • 第1-2行:选择被检查的书
  • 第 3 行:使用 APOC 过程apoc.path.subgraphNodes来定位所有连接的节点
  • 第 6-9 行:按标签对标识的节点进行排序,CategoryBook
  • 第 10-13 行:结果准备

结果

简易家庭更新:

╒═══════════════════════════════════════════════╤═══════════════════════════════════════════════════════════════╕
│"categoryNames"                                │"bookNames"                                                    │
╞═══════════════════════════════════════════════╪═══════════════════════════════════════════════════════════════╡
│["Home and Garden","Family","Home Improvement"]│["Easy Home Updates","Family Home Projects","Home Improvement"]│
└───────────────────────────────────────────────┴───────────────────────────────────────────────────────────────┘

儿童园艺:

╒════════════════════════════════════════╤════════════════════════════════════════╕
│"categoryNames"                         │"bookNames"                             │
╞════════════════════════════════════════╪════════════════════════════════════════╡
│["Kids","Gardening","Hobbies","Outdoors"│["Gardening for Kids","Gardening Today",│
│]                                       │"Green Thumb Made Easy"]                │
└────────────────────────────────────────┴────────────────────────────────────────┘

推荐阅读