sql-server - 测试 (90%) 和控制 (10%) 在零售商级别和 SQL 中的细分级别划分。怎么做?
问题描述
业务背景: 简单地说,我想将我的数据集拆分为测试和控制,并在未来运行一些活动。但是当我尝试拆分它时似乎涉及很多复杂性。
数据:
RetailerCode CID Segment
A6005 13SVC15 High
A6005 19VDE1F Low
A6005 1B3BD1F Medium
A6005 1B3HB48 Medium
A6005 1B3HB49 Low
A9006 1B3HB40 High
A9006 1B3HB41 High
A9006 1B3HB43 Low
A9006 1B3HB46 Medium
我有一个像这样的主表,其中包含客户及其零售商、细分市场等的列表(它有 30 多个列,但我不想在这里显示所有内容)。我想通过添加一列来拆分这个数据集,比如说“test_control”,它将告诉我们特定行是测试还是控制类别。我可以随机拆分,但我需要遵循以下规则,
- If a customer is tied to two or more retailers, then he should be in control group
- Each retailer will be provided with list of customers to target for the campaigns and the retailer will run the campaign. Here,
o Test-Control split should be done at Retailer level and then at segment level. For example, For each retailer
10% of their High customers to control and remaining 90% of their high customers to test.
10% of their Medium customers to control and remaining 90% of their Medium customers to test
10% of their Low customers to control and remaining 90% of their Low customers to test.
我可以分享我写的查询,但这会让你们完全困惑
但这是我遵循的高级方法
- Took out the records from the main table where customer occurred more than once and added a “test_control” column with “control” in it and saved it in a separate temp table 1
- Took out the records from the main table where customer occurred only once and saved it in a temp table 2
- On table 2, I tried to split to 10% (control) and 90% (test)
- Used union all to merge table 1 and table 2. But the ouput is totally wrong
预期产出
如您所见,它在细分级别也分为 10% 和 90%
请有人帮我解决这个问题。过去几天一直在努力,但没有运气
提前致谢
我写的查询
--Tagging each row whether it is repated more than twice or not
select * into #Repeat from (
SELECT CID, Count(*) as number,
case when Count(*)>1 then '1'
else '0'
end as repeat
FROM #Target GROUP BY CID
) temp
--Joining the above table to the master table and creating a new table
SELECT * into #T from(
select a.*,b.repeat from #Target a
left join #Repeat b ON a.CID = b.CID
) temp
alter table #T
add t_c Varchar(400)
UPDATE #T
SET t_c = CASE
WHEN repeat = '1' THEN 'control'
ELSE repeat
END
--creating a sub table which has no repetitive customer
SELECT * into #T1
from #T where t_c <> 'control'
--creating a sub table which has repetitive customer
select * from #T
SELECT * into #T2
from #T where t_c = 'control'
--splitting the table(#T1) into test and control
select * into #T3 from
(select *,
(case
when row_number() over (partition by RetailerCode,SEGMENT order by newid()) <= ((1-0.9-((select count(*) from #Target)/(select count(*) from #T2))) * count(*) over (partition by RetailerCode,SEGMENT)) then 'control'
else 'test'
end) as t_c_new
from #T1
where RETAILERCODE IN (SELECT DISTINCT RETAILERCODE FROM #Target
WHERE CID IN(
SELECT CID FROM
(SELECT CID, COUNT(*) AS NoOfOccurrences
FROM #Target GROUP BY CID
HAVING COUNT(*) = 1 ) temp
))
)temp1
--renaming a column
EXEC tempdb.sys.sp_rename N'#T2.t_c', N't_c_new', N'COLUMN';
ALTER TABLE #T3 DROP COLUMN t_c
--Merging the output
select * into #T4 from
(SELECT * from #T3 --1085612
union all
select * from #T2 -- 89622
) temp
--QA check…this is where I found out my logic is wrong
select RetailerCode, t_c_new, Segment
from
#T4 group by RetailerCode, t_c_new, Segment
order by RetailerCode, t_c_new, Segment
解决方案
如果您想将客户分配到特定组,您应该首先订购他们:
SELECT RetailerCode, CID, Segment,
CASE WHEN Percent_Rank()
Over (PARTITION BY retailercode, segment -- for each retailer/segment
ORDER BY ControlGroup, newid() -- all customers with multiple retailers are sorted low, i.e. will be in control group (if it's less than 10%)
) <= 0.1
THEN 'control'
ELSE 'test'
END AS GROUP
FROM
(
SELECT t.*,
-- flag customers to be put in control group
CASE WHEN Min(RetailerCode) Over (PARTITION BY CID)
= Max(RetailerCode) Over (PARTITION BY CID)
THEN 1 -- only a single retailer
ELSE 0 -- multiple retailers
END AS ControlGroup
-- if the RetailerCode/CID combination is unique:
-- CASE WHEN Count(*) Over (PARTITION BY CID) = 1
-- THEN 1 -- only a single retailer
-- ELSE 0 -- multiple retailers
-- END AS ControlGroup
FROM tab t
) AS dt;
推荐阅读
- angular - 我可以在 forRoot/forChild 方法中添加模块导入吗?
- android - 如何使 tableLayout 元素适合屏幕
- sql - 如何将mailto放在文本字符串中的电子邮件地址周围
- php - 将作曲家启用站点部署到生产时出现 Guzzle 401 错误?
- rest - Sharepoint Foundation REST 接口,更新对象失败
- ruby-on-rails - 外键的 ActiveRecord has_many 和 belongs_to 实现
- r - 通过 R 中的弹出窗口输入 Csv 文件
- python - 将 Python Pandas 行翻译成 R:
- python - 如何只记录某个级别名称 python 日志记录
- r - 如何恢复未保存的 R 代码?