首页 > 解决方案 > 如何为不同的产品/类别创建性别划分(比率)及其随时间的变化

问题描述

我在这里完全是新手,在一个又一个论坛潜水论坛以获得有关此问题的答案后,我决定自己构建它,但它太长了。我将不胜感激提供更简单的解决方案或场景的贡献。我会尽量详细,所以请做好准备,这将是一个很长的话题,我们开始吧:

问题:客户想知道这是否可以用 SQL Server 来完成:我们每个程序的性别划分是什么,以及在一年的最后一个季度中随着时间的推移它是如何变化的?

他们还提供了以下列名:ClientIDClientNameProgramStartDateGenderLocation

如果它是用 Excel、Tableau 或 PowerBI 等工具制作的,并不是很复杂,但它确实让我想到了如何用 SQL 来完成它。

所以,为了简洁起见,首先我决定创建两个程序:Program_A,Program_B

然后,我会创建一个数据透视表并运行一个看起来应该像这样的查询:

性别比例随时间的变化

无论如何,继续测试。首先,我创建了一个名为 General 的表并为其提供了一些数据(对于前面冗长的重复代码块,我深表歉意):

  CREATE TABLE General(
    ClientID int IDENTITY(1,1) NOT NULL,
    ClientName varchar(20) NOT NULL,
    Program varchar(20) NOT NULL,  
    StartDate date,
    Gender varchar(30) NULL, 
    Location varchar(30) NULL)


INSERT [dbo].[General] ([ClientName], [Program], [StartDate], [Gender], [Location])
VALUES ('John Doe', 'Program_A', '2020-10-01', 'Male', 'US')

INSERT [dbo].[Fellows] ([ClientName], [Program], [StartDate], [Gender], [Location])
VALUES ('Chewbaka Girl', 'Program_B', '2020-10-01', 'Female', 'CA')

INSERT [dbo].[Fellows] ([ClientName], [Program], [StartDate], [Gender], [Location])
VALUES ('Jane Doe', 'Program_A', '2020-12-01', 'Male', 'UK')

INSERT [dbo].[Fellows] ([ClientName], [Program], [StartDate], [Gender], [Location])
VALUES ( 'Carol Smith', 'Program_A', '2020-11-01', 'Female', 'CA')

INSERT [dbo].[Fellows] ([ClientName], [Program], [StartDate], [Gender], [Location])
VALUES ('Pedro Mostaza', 'Program_B', '2020-11-01', 'Male', 'CA')

INSERT [dbo].[Fellows] ([ClientName], [Program], [StartDate], [Gender], [Location])
VALUES ('Jean Plurier', 'Program_A', '2020-12-01', 'Male', 'UK')

INSERT [dbo].[Fellows] ([ClientName], [Program], [StartDate], [Gender], [Location])
VALUES ('Nicole Kiteman', 'Program_A', '2020-12-01', 'Female', 'US')

INSERT [dbo].[Fellows] ([ClientName], [Program], [StartDate], [Gender], [Location])
VALUES ('Sonia Cepeda', 'Program_B', '2020-10-01', 'Female', 'CA')

INSERT [dbo].[Fellows] ([ClientName], [Program], [StartDate], [Gender], [Location])
VALUES ('Alejandra Moncayo', 'Program_A', '2020-11-01', 'Female', 'UK')

INSERT [dbo].[Fellows] ([ClientName], [Program], [StartDate], [Gender], [Location])
VALUES ('Britanny Royce', 'Program_A', '2020-11-01', 'Female', 'CA')

INSERT [dbo].[Fellows] ([ClientName], [Program], [StartDate], [Gender], [Location])
VALUES ('Arnold Lotfrey', 'Program_A', '2020-10-01', 'Male', 'US')

INSERT [dbo].[Fellows] ([ClientName], [Program], [StartDate], [Gender], [Location])
VALUES ('Richard Books', 'Program_B', '2020-11-01', 'Male', 'UK')

INSERT [dbo].[Fellows] ([ClientName], [Program], [StartDate], [Gender], [Location])
VALUES ('Camero Lovely', 'Program_B', '2020-11-01', 'Female', 'CA')

INSERT [dbo].[Fellows] ([ClientName], [Program], [StartDate], [Gender], [Location])
VALUES ('Henry Lakes', 'Program_B', '2020-11-01', 'Male', 'UK')

INSERT [dbo].[Fellows] ([ClientName], [Program], [StartDate], [Gender], [Location])
VALUES ('Cameron Lovely', 'Program_B', '2020-12-01', 'Female', 'US')

INSERT [dbo].[Fellows] ([ClientName], [Program], [StartDate], [Gender], [Location])
VALUES ('Paula Mint', 'Program_A', '2020-11-01', 'Female', 'US')

INSERT [dbo].[Fellows] ([ClientName], [Program], [StartDate], [Gender], [Location])
VALUES ('Shirley Timer', 'Program_A', '2020-12-01', 'Female', 'CA')

INSERT [dbo].[Fellows] ([ClientName], [Program], [StartDate], [Gender], [Location])
VALUES ('Andrew Rocks', 'Program_A', '2020-10-01', 'Female', 'CA')

其次,我使用包含 WITH 和 CASE 的 CTE 创建了一个小查询,将 Gender 列拆分为 Male / Female 两个不同的列,并将它们的值转换为数字,以便稍后添加百分比:

WITH CTE
AS (SELECT Program, StartDate,
COUNT(CASE WHEN Gender='Male' THEN 1  END) As Male,
COUNT(CASE WHEN Gender='Female' THEN 1  END) As Female,
COUNT(CASE WHEN (Gender='' OR Gender IS Null) THEN 1 END) As 'NotAssigned'
FROM General 
GROUP BY Program, StartDate)
SELECT Program, StartDate, Male, Female,
 Male*100.0/(Male + Female) as Male_Ratio,
 Female*100.0/(Male + Female) as Female_Ratio
 INTO Program_GenderBreakdown
from CTE;

第三,我还将 StartDate 转换为 Months 以便于阅读:

SELECT * INTO Results 
FROM (
  SELECT
    Program, Male_Ratio AS Percentage, StartDate, 'Male' AS Gender
  FROM Program_GenderBreakdown
) T --temporary name
PIVOT (
  SUM(Percentage)
  FOR StartDate
  IN (
    [2020-10-01], 
    [2020-11-01],
    [2020-12-01]
    )
) AS PvtMale
UNION ALL --then unite both male and female pivots
--query For female
SELECT * FROM (
  SELECT
    Program, Female_Ratio AS Percentage, StartDate,'Female' AS Gender
  FROM Program_GenderBreakdown
) T
PIVOT (
  SUM(Percentage)
  FOR StartDate
  IN (
    [2020-10-01], 
    [2020-11-01],
    [2020-12-01]
    )
) AS PvtFemale

第四次使用 CAST 我将日期(为美观使用 2 位小数)转换为月份名称

SELECT Program, Gender, CAST([2020-10-01] AS DECIMAL(19,2)) AS 'October', CAST([2020-11-01] AS DECIMAL(19,2)) AS 'November', CAST([2020-12-01] AS DECIMAL(19,2)) AS 'December',
CASE WHEN [2020-10-01]=0 THEN -1 ELSE CAST((([2020-11-01]/[2020-10-01])-1) AS DECIMAL(19,2)) END AS 'MoMOct-Nov',
CASE WHEN [2020-11-01]=0 THEN -1 ELSE CAST((([2020-12-01]/[2020-11-01])-1) AS DECIMAL(19,2)) END AS 'MoMNov-Dec'
FROM PivotT
ORDER BY Program, Gender DESC

结果并不像我想的那样,因为表格看起来与我想要的相反,但到目前为止计算仍然有效。这是最终结果:

在此处输入图像描述

结论:虽然计算有效,但我如何使表格看起来像最初的预期,甚至值得吗?提前感谢您对此的任何帮助。或者即使您想路过并给我有关此线程的反馈,也将不胜感激。

标签: sqlsql-servercastingdata-analysispercentage

解决方案


您可以使用条件分组来实现此目的,如下所示:

SELECT CONCAT(Program,'_',datename(month,startdate)) as ProgramMonth, 
convert(decimal(5,2),(COUNT(CASE WHEN Gender = 'Male' then 1 end) * 1.0/COUNT(*)) * 100) as MaleCount,
convert(decimal(5,2),(COUNT(CASE WHEN Gender = 'Female' then 1 end) * 1.0/count(*)) * 100) as FemaleCount
FROM general
group by CONCAT(Program,'_',datename(month,startdate)) 
节目月 男性计数 女计数
Program_A_December 50.00 50.00
Program_A_11 月 0.00 100.00
Program_A_October 66.67 33.33
Program_B_December 0.00 100.00
Program_B_11 月 75.00 25.00
Program_B_October 0.00 100.00

推荐阅读