首页 > 解决方案 > Iterated sub sampling against distinct values, union results

问题描述

I made a SQL fiddle here

I have a table that has for each row: a category, an document id and a ranking.

The categories are ranked within themselves. For each category, I would like to select a sub sample. All the sub samples should be stacked together in a table.

The catch is that I would like to sub sample by iteratively fetching a halved row index among that category, e.g. if a given category has 32 items, then I would like to fetch rows 32, 16, 8, 4, 2, 1.

In my SQL fiddle I was able to do this for one particular category but I can't figure out how to:

a) do it for all categories in [Major Focus Area] b) union the resulting subsamples into one table

Any hints or help is much appreciated! I am working in TSQL (MS SQL Server)

Sample data (MS Sql):

CREATE TABLE Rank_MajorAreas
    ([Rank] int, [Major Focus Area] varchar(17), [ID] int)
;

INSERT INTO Rank_MajorAreas
    ([Rank], [Major Focus Area], [ID])
VALUES
    (1, 'Welfare', 71366),
    (2, 'Welfare', 70415),
    (3, 'Truck Driving', 70423),
    (4, 'Peasant''s Office', 74566),
    (5, 'Peasant''s Office', 71560),
    (6, 'Nail Therapy', 77497),
    (7, 'Truck Driving', 76193),
    (8, 'Truck Driving', 79226),
    (9, 'Truck Driving', 70222),
    (10, 'Welfare', 77336),
    (11, 'Truck Driving', 70823),
    (12, 'Welfare', 77096),
    (13, 'Welfare', 71335),
    (14, 'Nail Therapy', 73551),
    (15, 'Welfare', 72146),
    (16, 'Truck Driving', 74023),
    (17, 'Welfare', 71546),
    (18, 'Nail Therapy', 74755),
    (19, 'Peasant''s Office', 77834),
    (20, 'Welfare', 75667),
    (21, 'Peasant''s Office', 71342),
    (22, 'Peasant''s Office', 77457),
    (23, 'Peasant''s Office', 77923),
    (24, 'Welfare', 76508),
    (25, 'Welfare', 75714),
    (26, 'Welfare', 73654),
    (27, 'Welfare', 75753),
    (28, 'Truck Driving', 71481),
    (29, 'Truck Driving', 79424),
    (30, 'Peasant''s Office', 76143),
    (31, 'Truck Driving', 74076),
    (32, 'Nail Therapy', 78714),
    (33, 'Nail Therapy', 79924),
    (34, 'Welfare', 71482),
    (35, 'Welfare', 70050),
    (36, 'Welfare', 76053),
    (37, 'Nail Therapy', 79591),
    (38, 'Peasant''s Office', 75197),
    (39, 'Nail Therapy', 74104),
    (40, 'Welfare', 72891),
    (41, 'Truck Driving', 73621),
    (42, 'Peasant''s Office', 71713),
    (43, 'Welfare', 71979),
    (44, 'Peasant''s Office', 71601),
    (45, 'Peasant''s Office', 73928),
    (46, 'Nail Therapy', 71759),
    (47, 'Nail Therapy', 70379),
    (48, 'Welfare', 71215),
    (49, 'Truck Driving', 70908),
    (50, 'Welfare', 71989)
;

Code thus far:

CREATE VIEW MFA AS
  SELECT ROW_NUMBER() OVER(ORDER BY fa.[Rank] ASC) AS Row
        ,*
  FROM Rank_MajorAreas AS fa
  -- ideally we could make a view per Focus Area
  WHERE fa.[Major Focus Area] = 'Welfare'
  ORDER BY Row ASC
  OFFSET 0 ROWS;

DECLARE @start int
SELECT @start = (SELECT COUNT(*) FROM MFA)

;WITH Sample( Row ) AS
(
  Select @start as Row
    UNION ALL
  SELECT ROUND(Row/2, 0)
    FROM Sample
    WHERE Row > 0
)
SELECT * FROM MFA AS mfa
INNER JOIN Sample AS s on s.Row = mfa.Row
ORDER BY mfa.Row ASC

Desired Results, where each focus area is subsampled, the subsamples are returned all together as a single result

Row Rank    Major Focus Area    ID
1   1   Welfare 71366   
2   2   Welfare 70415   
4   12  Welfare 77096   
9   24  Welfare 76508   
19  50  Welfare 71989   
...
1   6   Nail Therapy    77497
2   14  Nail Therapy    73551
4   32  Nail Therapy    78714
9   47  Nail Therapy    7037

标签: sqlsql-servertsql

解决方案


您需要在子句中使用PARTITION BYonMajor Focus Area列。OVER以下是修改后的 TSQL

CREATE VIEW MFA AS
  SELECT ROW_NUMBER() OVER(PARTITION BY fa.[Major Focus Area] ORDER BY fa.[Rank] ASC) AS Row
        ,*
  FROM Rank_MajorAreas AS fa
  -- ideally we could make a view per Focus Area
  ORDER BY [Major Focus Area], Row ASC
  OFFSET 0 ROWS;

DECLARE @start int
SELECT @start = (SELECT COUNT(*) FROM MFA)

;WITH Sample( Row, fa ) AS
(
  Select COUNT(*) as Row, [Major Focus Area] as fa  FROM MFA GROUP BY [Major Focus Area]
    UNION ALL
  SELECT ROUND(Row/2, 0), fa
    FROM Sample
    WHERE Row > 0
)

SELECT mfa.Row, mfa.Rank, mfa.[Major Focus Area] FROM MFA AS mfa
 INNER JOIN Sample AS s on s.Row = mfa.Row and s.fa=mfa.[Major Focus Area]
 ORDER BY [Major Focus Area], mfa.Row ASC

SQL小提琴


推荐阅读