c# - 如何使用实体框架(核心)解决每组最大 n 问题?
问题描述
问题
例如,给定以下数据集:
new Entity { Id = 1, Group = 1, Value = "ABC", ... },
new Entity { Id = 2, Group = 1, Value = "DEF", ... },
new Entity { Id = 3, Group = 1, Value = "FGH", ... },
new Entity { Id = 4, Group = 1, Value = "LOP", ... },
new Entity { Id = 5, Group = 2, Value = "ALO", ... },
new Entity { Id = 6, Group = 2, Value = "PEO", ... },
new Entity { Id = 7, Group = 2, Value = "AHB", ... },
new Entity { Id = 8, Group = 2, Value = "DHB", ... },
new Entity { Id = 9, Group = 2, Value = "QPA", ... },
new Entity { Id = 10, Group = 2, Value = "LAN", ... },
// ... millions more records
如何进行高效的查询(避免 N+1 查询问题)并为我提供每个Group
排序的前 3 条记录Value
?
new Entity { Id = 1, Group = 1, Value = "ABC", ... },
new Entity { Id = 2, Group = 1, Value = "DEF", ... },
new Entity { Id = 3, Group = 1, Value = "FGH", ... },
new Entity { Id = 5, Group = 2, Value = "ALO", ... },
new Entity { Id = 7, Group = 2, Value = "AHB", ... },
new Entity { Id = 8, Group = 2, Value = "DHB", ... },
// ...
我尝试了什么?
Stack Overflow 上的大多数 LINQ 或实体框架解决方案都使用GroupBy
它Take
来评估客户端(这意味着所有记录都导入内存中,然后分组发生在数据库之外)。
我试过:
var list = await _dbContext.Entities
.Select(x => new
{
OrderKey = _dbContext.Entities.Count(y =>
x.Group == y.Group
&& y.Value < x.Value),
Value = x,
})
.Where(x => x.OrderKey < 3)
.OrderBy(x => x.OrderKey)
.Select(x => x.Value)
.ToListAsync(cancellationToken);
但我很确定这是效率低下的。
奖金问题
如何将此逻辑提取到IQueryable<T>
返回的扩展方法中IQueryable<T>
?
解决方案
有趣的问题。我看到的主要问题是没有用于执行此类操作的标准SQL 构造 - 大多数数据库都提供自己的操作符来处理行集“窗口”,例如 SqlServer 的SELECT - OVER等。也没有“标准”LINQ 操作符/ 模式。
给定
IQueryable<Entity> source
在 LINQ 中执行此类操作的典型方法是
var query = source.GroupBy(e => e.Group)
.SelectMany(g => g.OrderBy(e => e.Value).Take(3));
EF6 转换为以下 SQL
SELECT
[Limit1].[Id] AS [Id],
[Limit1].[Group] AS [Group],
[Limit1].[Value] AS [Value]
FROM (SELECT DISTINCT
[Extent1].[Group] AS [Group]
FROM [dbo].[Entity] AS [Extent1] ) AS [Distinct1]
CROSS APPLY (SELECT TOP (3) [Project2].[Id] AS [Id], [Project2].[Group] AS [Group], [Project2].[Value] AS [Value]
FROM ( SELECT
[Extent2].[Id] AS [Id],
[Extent2].[Group] AS [Group],
[Extent2].[Value] AS [Value]
FROM [dbo].[Entity] AS [Extent2]
WHERE [Distinct1].[Group] = [Extent2].[Group]
) AS [Project2]
ORDER BY [Project2].[Value] ASC ) AS [Limit1]
我不能说它是好还是坏的翻译,但至少它是一些翻译。重要的是当前的 EF Core(在撰写本文时是最新的 2.2.3)无法将其转换为 SQL,并将使用客户端评估(如您所述)。
所以目前似乎只有 3 种可翻译的 LINQ 方式来编写这样的查询:
(1) (你的)
var query = source.Where(e => source.Count(
e2 => e2.Group == e.Group && e2.Value.CompareTo(e.Value) < 0) < 3);
翻译成
SELECT [e].[Id], [e].[Group], [e].[Value]
FROM [Entity] AS [e]
WHERE (
SELECT COUNT(*)
FROM [Entity] AS [e2]
WHERE ([e2].[Group] = [e].[Group]) AND [e2].[Value] < [e].[Value]
) < 3
(2)
var query = source.Where(e => source.Where(e2 => e2.Group == e.Group)
.OrderBy(e2 => e2.Value).Take(3).Contains(e));
翻译成
SELECT [e].[Id], [e].[Group], [e].[Value]
FROM [Entity] AS [e]
WHERE [e].[Id] IN (
SELECT TOP(3) [e2].[Id]
FROM [Entity] AS [e2]
WHERE [e2].[Group] = [e].[Group]
ORDER BY [e2].[Value]
)
(3)
var query = source.SelectMany(e => source.Where(e2 => e2.Group == e.Group)
.OrderBy(e2 => e2.Value).Take(3).Where(e2 => e2.Id == e.Id));
翻译成
SELECT [t].[Id], [t].[Group], [t].[Value]
FROM [Entity] AS [e]
CROSS APPLY (
SELECT TOP(3) [e2].[Id], [e2].[Group], [e2].[Value]
FROM [Entity] AS [e2]
WHERE [e2].[Group] = [e].[Group]
ORDER BY [e2].[Value]
) AS [t]
WHERE [t].[Id] = [e].[Id]
我不能说选择哪一个——你必须衡量执行计划。
#1 比较运算符的主要缺点(从示例中可以看出 - 不能<
用于string
s,因为Guid
s 更糟糕),如果Value
在分组内不是唯一的,也将无法正常工作。
从另一方面看,它可能是三者中最快的。但是#2 和#3(甚至#1)的执行计划可能是相同的。
话虽如此,我不会提供一个通用的方法,因为所有这些方法都需要不同的参数,最终只有共同的参数是组选择器Expression<Func<T, TGroupKey>>
(例如e => e.Group
)。但是(特别是对于 #2 和 #3)可以编写这样的方法 - 它需要一些手动Expression
操作,总的来说我不确定它是否值得付出努力