sql - 如何为不同的产品/类别创建性别划分(比率)及其随时间的变化
问题描述
我在这里完全是新手,在一个又一个论坛潜水论坛以获得有关此问题的答案后,我决定自己构建它,但它太长了。我将不胜感激提供更简单的解决方案或场景的贡献。我会尽量详细,所以请做好准备,这将是一个很长的话题,我们开始吧:
问题:客户想知道这是否可以用 SQL Server 来完成:我们每个程序的性别划分是什么,以及在一年的最后一个季度中随着时间的推移它是如何变化的?
他们还提供了以下列名:ClientID、ClientName、Program、StartDate、Gender、Location。
如果它是用 Excel、Tableau 或 PowerBI 等工具制作的,并不是很复杂,但它确实让我想到了如何用 SQL 来完成它。
所以,为了简洁起见,首先我决定创建两个程序:Program_A,Program_B
然后,我会创建一个数据透视表并运行一个看起来应该像这样的查询:
无论如何,继续测试。首先,我创建了一个名为 General 的表并为其提供了一些数据(对于前面冗长的重复代码块,我深表歉意):
CREATE TABLE General(
ClientID int IDENTITY(1,1) NOT NULL,
ClientName varchar(20) NOT NULL,
Program varchar(20) NOT NULL,
StartDate date,
Gender varchar(30) NULL,
Location varchar(30) NULL)
INSERT [dbo].[General] ([ClientName], [Program], [StartDate], [Gender], [Location])
VALUES ('John Doe', 'Program_A', '2020-10-01', 'Male', 'US')
INSERT [dbo].[Fellows] ([ClientName], [Program], [StartDate], [Gender], [Location])
VALUES ('Chewbaka Girl', 'Program_B', '2020-10-01', 'Female', 'CA')
INSERT [dbo].[Fellows] ([ClientName], [Program], [StartDate], [Gender], [Location])
VALUES ('Jane Doe', 'Program_A', '2020-12-01', 'Male', 'UK')
INSERT [dbo].[Fellows] ([ClientName], [Program], [StartDate], [Gender], [Location])
VALUES ( 'Carol Smith', 'Program_A', '2020-11-01', 'Female', 'CA')
INSERT [dbo].[Fellows] ([ClientName], [Program], [StartDate], [Gender], [Location])
VALUES ('Pedro Mostaza', 'Program_B', '2020-11-01', 'Male', 'CA')
INSERT [dbo].[Fellows] ([ClientName], [Program], [StartDate], [Gender], [Location])
VALUES ('Jean Plurier', 'Program_A', '2020-12-01', 'Male', 'UK')
INSERT [dbo].[Fellows] ([ClientName], [Program], [StartDate], [Gender], [Location])
VALUES ('Nicole Kiteman', 'Program_A', '2020-12-01', 'Female', 'US')
INSERT [dbo].[Fellows] ([ClientName], [Program], [StartDate], [Gender], [Location])
VALUES ('Sonia Cepeda', 'Program_B', '2020-10-01', 'Female', 'CA')
INSERT [dbo].[Fellows] ([ClientName], [Program], [StartDate], [Gender], [Location])
VALUES ('Alejandra Moncayo', 'Program_A', '2020-11-01', 'Female', 'UK')
INSERT [dbo].[Fellows] ([ClientName], [Program], [StartDate], [Gender], [Location])
VALUES ('Britanny Royce', 'Program_A', '2020-11-01', 'Female', 'CA')
INSERT [dbo].[Fellows] ([ClientName], [Program], [StartDate], [Gender], [Location])
VALUES ('Arnold Lotfrey', 'Program_A', '2020-10-01', 'Male', 'US')
INSERT [dbo].[Fellows] ([ClientName], [Program], [StartDate], [Gender], [Location])
VALUES ('Richard Books', 'Program_B', '2020-11-01', 'Male', 'UK')
INSERT [dbo].[Fellows] ([ClientName], [Program], [StartDate], [Gender], [Location])
VALUES ('Camero Lovely', 'Program_B', '2020-11-01', 'Female', 'CA')
INSERT [dbo].[Fellows] ([ClientName], [Program], [StartDate], [Gender], [Location])
VALUES ('Henry Lakes', 'Program_B', '2020-11-01', 'Male', 'UK')
INSERT [dbo].[Fellows] ([ClientName], [Program], [StartDate], [Gender], [Location])
VALUES ('Cameron Lovely', 'Program_B', '2020-12-01', 'Female', 'US')
INSERT [dbo].[Fellows] ([ClientName], [Program], [StartDate], [Gender], [Location])
VALUES ('Paula Mint', 'Program_A', '2020-11-01', 'Female', 'US')
INSERT [dbo].[Fellows] ([ClientName], [Program], [StartDate], [Gender], [Location])
VALUES ('Shirley Timer', 'Program_A', '2020-12-01', 'Female', 'CA')
INSERT [dbo].[Fellows] ([ClientName], [Program], [StartDate], [Gender], [Location])
VALUES ('Andrew Rocks', 'Program_A', '2020-10-01', 'Female', 'CA')
其次,我使用包含 WITH 和 CASE 的 CTE 创建了一个小查询,将 Gender 列拆分为 Male / Female 两个不同的列,并将它们的值转换为数字,以便稍后添加百分比:
WITH CTE
AS (SELECT Program, StartDate,
COUNT(CASE WHEN Gender='Male' THEN 1 END) As Male,
COUNT(CASE WHEN Gender='Female' THEN 1 END) As Female,
COUNT(CASE WHEN (Gender='' OR Gender IS Null) THEN 1 END) As 'NotAssigned'
FROM General
GROUP BY Program, StartDate)
SELECT Program, StartDate, Male, Female,
Male*100.0/(Male + Female) as Male_Ratio,
Female*100.0/(Male + Female) as Female_Ratio
INTO Program_GenderBreakdown
from CTE;
第三,我还将 StartDate 转换为 Months 以便于阅读:
SELECT * INTO Results
FROM (
SELECT
Program, Male_Ratio AS Percentage, StartDate, 'Male' AS Gender
FROM Program_GenderBreakdown
) T --temporary name
PIVOT (
SUM(Percentage)
FOR StartDate
IN (
[2020-10-01],
[2020-11-01],
[2020-12-01]
)
) AS PvtMale
UNION ALL --then unite both male and female pivots
--query For female
SELECT * FROM (
SELECT
Program, Female_Ratio AS Percentage, StartDate,'Female' AS Gender
FROM Program_GenderBreakdown
) T
PIVOT (
SUM(Percentage)
FOR StartDate
IN (
[2020-10-01],
[2020-11-01],
[2020-12-01]
)
) AS PvtFemale
第四次使用 CAST 我将日期(为美观使用 2 位小数)转换为月份名称
SELECT Program, Gender, CAST([2020-10-01] AS DECIMAL(19,2)) AS 'October', CAST([2020-11-01] AS DECIMAL(19,2)) AS 'November', CAST([2020-12-01] AS DECIMAL(19,2)) AS 'December',
CASE WHEN [2020-10-01]=0 THEN -1 ELSE CAST((([2020-11-01]/[2020-10-01])-1) AS DECIMAL(19,2)) END AS 'MoMOct-Nov',
CASE WHEN [2020-11-01]=0 THEN -1 ELSE CAST((([2020-12-01]/[2020-11-01])-1) AS DECIMAL(19,2)) END AS 'MoMNov-Dec'
FROM PivotT
ORDER BY Program, Gender DESC
结果并不像我想的那样,因为表格看起来与我想要的相反,但到目前为止计算仍然有效。这是最终结果:
结论:虽然计算有效,但我如何使表格看起来像最初的预期,甚至值得吗?提前感谢您对此的任何帮助。或者即使您想路过并给我有关此线程的反馈,也将不胜感激。
解决方案
您可以使用条件分组来实现此目的,如下所示:
SELECT CONCAT(Program,'_',datename(month,startdate)) as ProgramMonth,
convert(decimal(5,2),(COUNT(CASE WHEN Gender = 'Male' then 1 end) * 1.0/COUNT(*)) * 100) as MaleCount,
convert(decimal(5,2),(COUNT(CASE WHEN Gender = 'Female' then 1 end) * 1.0/count(*)) * 100) as FemaleCount
FROM general
group by CONCAT(Program,'_',datename(month,startdate))
节目月 | 男性计数 | 女计数 |
---|---|---|
Program_A_December | 50.00 | 50.00 |
Program_A_11 月 | 0.00 | 100.00 |
Program_A_October | 66.67 | 33.33 |
Program_B_December | 0.00 | 100.00 |
Program_B_11 月 | 75.00 | 25.00 |
Program_B_October | 0.00 | 100.00 |
推荐阅读
- javascript - 如何维护 isAuth 并保护私有路由?
- oracle12c - 无法在以下查询中创建物化视图
- firebase - 无法使用 Firebase 管理员在 Node.js 的 Google 存储对象上设置元数据
- php - 优雅地处理从代码中不同位置出现的非对象错误的属性
- java - 尝试使用颜色图绘制垂直线性渐变,但只使用前 9 种颜色:为什么?
- arrays - 在 Twig 中显示数据数组
- html - 处理大型 svg 图像的技术
- android - 活动幻灯片动画仅在按下后退按钮时有效
- ios - 月份更改时调用 FSCalendar 函数
- c# - 将两个多级类对象与元素更新c#合并