graph - Neo4j 笛卡尔积限制 - 查询优化
问题描述
给定以下架构
我需要获取Collection
每个列表以及用户指定的一组ProductCard
(它的Product
变体)匹配标准:
- 集合类型
- 1..4种产品(每一种选择
ProductType
必须有一个ProductCard
。) - 套装价格
我从这样的查询开始
MATCH (c:Collection {type: 'selected_collection_type'})<-[:FROM_COLLECTION]-(:Product)-[:OF_TYPE]->(pt1:ProductType {title: '1st product type'}), (c)<-[:FROM_COLLECTION]-(:Product)-[:OF_TYPE]->(pt2:ProductType {title: '2nd product type'}),(c)<-[:FROM_COLLECTION]-(:Product)-[:OF_TYPE]->(pt3:ProductType {title: '3rd product type'}), (c)<-[:FROM_COLLECTION]-(:Product)-[:OF_TYPE]->(pt4:ProductType {title: '4th product type'})
CALL apoc.cypher.run('
WITH {c} AS c, {pt1} AS pt1, {pt2} AS pt2, {pt3} AS pt3, {pt4} AS pt4
MATCH (pt1)<-[:OF_TYPE]-(p1:Product)-[:FROM_COLLECTION]->(c), (pt2)<-[:OF_TYPE]-(p2:Product)-[:FROM_COLLECTION]->(c), (pt3)<-[:OF_TYPE]-(p3:Product)-[:FROM_COLLECTION]->(c), (pt4)<-[:OF_TYPE]-(p4:Product)-[:FROM_COLLECTION]->(c), (pc1:ProductCard)-[:VARIANT_OF]->(p1), (pc2:ProductCard)-[:VARIANT_OF]->(p2), (pc3:ProductCard)-[:VARIANT_OF]->(p3), (pc4:ProductCard)-[:VARIANT_OF]->(p4)
WHERE (pc1.price + pc2.price + pc3.price + pc4.price < price_margin_for_set)
RETURN pc1, pc2, pc3, pc4, (p1.weight + p2.weight + p3.weight + p4.weight) AS sweight ORDER BY sweight DESC LIMIT 1
', {c:c, pt1:pt1, pt2:pt2, pt3:pt3, pt4:pt4}) YIELD value
RETURN c, value ORDER BY value.sweight DESC LIMIT 8;
它适用于多达 3 种选定的产品类型,但是当我添加第 4 种产品类型时,速度会急剧下降。这里的问题是我只需要从子查询返回的 1 个集合,但是从所有产品变体(Product
可以有 1..~10 ProductCard
)计算的笛卡尔积对于 4 种类型来说相当大。
如何优化此查询以获得从子查询返回 1 组匹配价格标准所需的性能/减少变化计数?
这是EXPLAIN 解释
编辑:稍微改变了查询
WITH ['Product Type 1', 'Product Type 2', 'Product Type 3', 'Product Type 4'] as types
MATCH (c:Collection)<-[:FROM_COLLECTION]-(:Product)-[:OF_TYPE]->(pt:ProductType)
WHERE pt.title in types AND c.type = 'collection type'
WITH c, size(types) as inputCnt, count(DISTINCT pt) as cnt
WHERE cnt = inputCnt
CALL apoc.cypher.run('
WITH {c} AS c
MATCH (c)<-[:FROM_COLLECTION]-(p1:Product)-[:OF_TYPE]->(:ProductType {title: "Product Type 1"})
MATCH (pc1:ProductCard)-[:VARIANT_OF]->(p1)
MATCH (c)<-[:FROM_COLLECTION]-(p2:Product)-[:OF_TYPE]->(:ProductType {title: "Product Type 2"})
MATCH (pc2:ProductCard)-[:VARIANT_OF]->(p2)
MATCH (c)<-[:FROM_COLLECTION]-(p3:Product)-[:OF_TYPE]->(:ProductType {title: "Product Type 3"})
MATCH (pc3:ProductCard)-[:VARIANT_OF]->(p3)
MATCH (c)<-[:FROM_COLLECTION]-(p4:Product)-[:OF_TYPE]->(:ProductType {title: "Product Type 4"})
MATCH (pc4:ProductCard)-[:VARIANT_OF]->(p4)
WHERE (pc1.price + pc2.price + pc3.price + pc4.price < 1000)
RETURN pc1, pc2, pc3, pc4, (p1.weight + p2.weight + p3.weight + p4.weight) AS sweight ORDER BY sweight DESC LIMIT 1
', {c:c}) YIELD value
RETURN DISTINCT c, value LIMIT 8;
解释
子查询说明
解决方案
打破笛卡尔积的最简单方法是添加逻辑,使一组明显胜过其他组。例子
WHERE p1.value > p2.value > p3.value > p4.value
但是,此查询的性质仍然会导致许多集具有微小的微不足道的变化。我会改为 return WHERE p1.value < 1000 WITH COLLECT(p1) as possible
,并处理其余的客户端以最小化数据传输。(如果你真的想要,你可以在收集时按值对 p1 排序,然后对最小的 3 求和,然后过滤掉任何值大于1000-smallest_set
但感觉有点过分的东西)
推荐阅读
- botframework - 我们可以通过编程方式将文件作为附件与团队中的消息一起发送吗?
- matlab - matlab coder.extrinsic,定义/初始化更复杂的数据类型
- postgresql - 有没有办法以编程方式返回或释放与池的连接?
- prestashop - 提高移动版页面速度洞察排名
- node.js - NodeJS 缓冲区中的转义字节
- reactjs - 在反应js中获取cookie
- react-native - 如何在不使用 expo 更改应用程序的情况下更改 Web 应用程序
- c++ - 在 cpp 中的一元运算符重载中,如何确定我们需要在对象的哪一侧放置运算符符号?
- freemarker - Liferay 7:是否可以自动从链接站点获取图像到站点地图?
- c++ - 如何定义一个以捕获为参数的 lambda 的函数?