首页 > 解决方案 > 从数据库中随机选择每个客户的订单

问题描述

DB-小提琴

CREATE TABLE sales (
    id SERIAL PRIMARY KEY,
    event_date DATE,
    customer VARCHAR,
    orderID VARCHAR,
    sales_volume DECIMAL
);

INSERT INTO sales
(event_date, customer, orderID, sales_volume)
VALUES 
('2020-01-08', 'Customer_A', 'Order_001', '130'),
('2020-01-12', 'Customer_A', 'Order_002', '120'),
('2020-01-22', 'Customer_A', 'Order_003', '115'),
('2020-01-22', 'Customer_C', 'Order_001', '300'),
('2020-01-23', 'Customer_C', 'Order_002', '500'),

('2020-04-08', 'Customer_B', 'Order_001', '325'),
('2020-04-12', 'Customer_B', 'Order_002', '875'),
('2020-04-15', 'Customer_B', 'Order_003', '910'),
('2020-04-20', 'Customer_B', 'Order_004', '723'),
('2020-04-30', 'Customer_C', 'Order_003', '665'),

('2020-06-01', 'Customer_B', 'Order_005', '982'),
('2020-06-15', 'Customer_B', 'Order_006', '100'),
('2020-06-19', 'Customer_C', 'Order_004', '250'),
('2020-06-20', 'Customer_C', 'Order_005', '322'),
('2020-06-25', 'Customer_A', 'Order_004', '445');

预期结果:

customer     |   orderid     |      event_date    |     sales_volume
-------------|---------------|--------------------|----------------------
Customer_A   |   Order_001   |      2020-01-08    |       130      
Customer_A   |   Order_003   |      2020-01-22    |       115
Customer_C   |   Order_002   |      2020-01-23    |       500
Customer_C   |   Order_001   |      2020-01-22    |       300
-------------|---------------|--------------------|------------------------
Customer_B   |   Order_002   |      2020-04-12    |       875
Customer_B   |   Order_003   |      2020-04-15    |       910
Customer_C   |   Order_003   |      2020-04-30    |       665
-------------|---------------|--------------------|------------------------
Customer_A   |   Order_004   |      2020-06-25    |       445
Customer_B   |   Order_005   |      2020-06-01    |       982
Customer_B   |   Order_006   |      2020-06-15    |       100
Customer_C   |   Order_005   |      2020-06-20    |       322
Customer_C   |   Order_004   |      2020-06-19    |       250

我有一个庞大的数据库,需要从中提取一些数据以进行案例研究。
问题是我需要提取全年的数据,因为我希望能够在案例研究中进行每月分析。因此,我不能用datesor限制提取LIMIT

因此,我解决这个问题的想法是一个查询,每个客户每月随机抽取最多两个订单

你知道这是否可能吗?
如果是,我需要如何修改以下查询?

SELECT
s.customer,
s.orderID,
s.event_date,
SUM(s.sales_volume) AS sales_volume
FROM sales s
GROUP BY 1,2,3
ORDER BY 1,2,3;

标签: sqlpostgresql

解决方案


因此,我解决这个问题的想法是一个查询,每个客户每月随机抽取最多两个订单。

您可以使用:

select s.*
from (select s.*,
             row_number() over (partition by customer, date_trunc('month', event_date) order by random()) as seqnum
      from sales s
     ) s
where seqnum <= 2;

不过,老实说,出于分析目的,我更愿意随机抽取客户样本——比如 1% 或 5%——以及他们的所有交易。


推荐阅读