首页 > 解决方案 > 如何使用实体框架(核心)解决每组最大 n 问题?

问题描述

问题

例如,给定以下数据集:

new Entity { Id = 1, Group = 1, Value = "ABC", ... },
new Entity { Id = 2, Group = 1, Value = "DEF", ... },
new Entity { Id = 3, Group = 1, Value = "FGH", ... },
new Entity { Id = 4, Group = 1, Value = "LOP", ... },
new Entity { Id = 5, Group = 2, Value = "ALO", ... },
new Entity { Id = 6, Group = 2, Value = "PEO", ... },
new Entity { Id = 7, Group = 2, Value = "AHB", ... },
new Entity { Id = 8, Group = 2, Value = "DHB", ... },
new Entity { Id = 9, Group = 2, Value = "QPA", ... },
new Entity { Id = 10, Group = 2, Value = "LAN", ... },
// ... millions more records

如何进行高效的查询(避免 N+1 查询问题)并为我提供每个Group排序的前 3 条记录Value

new Entity { Id = 1, Group = 1, Value = "ABC", ... },
new Entity { Id = 2, Group = 1, Value = "DEF", ... },
new Entity { Id = 3, Group = 1, Value = "FGH", ... },
new Entity { Id = 5, Group = 2, Value = "ALO", ... },
new Entity { Id = 7, Group = 2, Value = "AHB", ... },
new Entity { Id = 8, Group = 2, Value = "DHB", ... },
// ...

我尝试了什么?

Stack Overflow 上的大多数 LINQ 或实体框架解决方案都使用GroupByTake来评估客户端(这意味着所有记录都导入内存中,然后分组发生在数据库之外)。

我试过:

var list = await _dbContext.Entities
    .Select(x => new 
    { 
        OrderKey = _dbContext.Entities.Count(y =>
            x.Group == y.Group
                && y.Value < x.Value),
        Value = x,
     })
     .Where(x => x.OrderKey < 3)
     .OrderBy(x => x.OrderKey)
     .Select(x => x.Value)
     .ToListAsync(cancellationToken);

但我很确定这是效率低下的。

奖金问题

如何将此逻辑提取到IQueryable<T>返回的扩展方法中IQueryable<T>

标签: c#entity-frameworklinqentity-framework-coregreatest-n-per-group

解决方案


有趣的问题。我看到的主要问题是没有用于执行此类操作的标准SQL 构造 - 大多数数据库都提供自己的操作符来处理行集“窗口”,例如 SqlServer 的SELECT - OVER等。也没有“标准”LINQ 操作符/ 模式。

给定

IQueryable<Entity> source

在 LINQ 中执行此类操作的典型方法是

var query = source.GroupBy(e => e.Group)
    .SelectMany(g => g.OrderBy(e => e.Value).Take(3));

EF6 转换为以下 SQL

SELECT
    [Limit1].[Id] AS [Id],
    [Limit1].[Group] AS [Group],
    [Limit1].[Value] AS [Value]
    FROM   (SELECT DISTINCT
        [Extent1].[Group] AS [Group]
        FROM [dbo].[Entity] AS [Extent1] ) AS [Distinct1]
    CROSS APPLY  (SELECT TOP (3) [Project2].[Id] AS [Id], [Project2].[Group] AS [Group], [Project2].[Value] AS [Value]
        FROM ( SELECT
            [Extent2].[Id] AS [Id],
            [Extent2].[Group] AS [Group],
            [Extent2].[Value] AS [Value]
            FROM [dbo].[Entity] AS [Extent2]
            WHERE [Distinct1].[Group] = [Extent2].[Group]
        )  AS [Project2]
        ORDER BY [Project2].[Value] ASC ) AS [Limit1]

我不能说它是好还是坏的翻译,但至少它是一些翻译。重要的是当前的 EF Core(在撰写本文时是最新的 2.2.3)无法将其转换为 SQL,并将使用客户端评估(如您所述)。

所以目前似乎只有 3 种可翻译的 LINQ 方式来编写这样的查询:

(1) (你的)

var query = source.Where(e => source.Count(
    e2 => e2.Group == e.Group && e2.Value.CompareTo(e.Value) < 0) < 3);

翻译成

  SELECT [e].[Id], [e].[Group], [e].[Value]
  FROM [Entity] AS [e]
  WHERE (
      SELECT COUNT(*)
      FROM [Entity] AS [e2]
      WHERE ([e2].[Group] = [e].[Group]) AND [e2].[Value] < [e].[Value]
  ) < 3

(2)

var query = source.Where(e => source.Where(e2 => e2.Group == e.Group)
    .OrderBy(e2 => e2.Value).Take(3).Contains(e));

翻译成

  SELECT [e].[Id], [e].[Group], [e].[Value]
  FROM [Entity] AS [e]
  WHERE [e].[Id] IN (
      SELECT TOP(3) [e2].[Id]
      FROM [Entity] AS [e2]
      WHERE [e2].[Group] = [e].[Group]
      ORDER BY [e2].[Value]
  )

(3)

var query = source.SelectMany(e => source.Where(e2 => e2.Group == e.Group)
    .OrderBy(e2 => e2.Value).Take(3).Where(e2 => e2.Id == e.Id));

翻译成

  SELECT [t].[Id], [t].[Group], [t].[Value]
  FROM [Entity] AS [e]
  CROSS APPLY (
      SELECT TOP(3) [e2].[Id], [e2].[Group], [e2].[Value]
      FROM [Entity] AS [e2]
      WHERE [e2].[Group] = [e].[Group]
      ORDER BY [e2].[Value]
  ) AS [t]
  WHERE [t].[Id] = [e].[Id]

我不能说选择哪一个——你必须衡量执行计划。

#1 比较运算符的主要缺点(从示例中可以看出 - 不能<用于strings,因为Guids 更糟糕),如果Value在分组内不是唯一的,也将无法正常工作。

从另一方面看,它可能是三者中最快的。但是#2 和#3(甚至#1)的执行计划可能是相同的。

话虽如此,我不会提供一个通用的方法,因为所有这些方法都需要不同的参数,最终只有共同的参数是组选择器Expression<Func<T, TGroupKey>>(例如e => e.Group)。但是(特别是对于 #2 和 #3)可以编写这样的方法 - 它需要一些手动Expression操作,总的来说我不确定它是否值得付出努力


推荐阅读