excel - Add a running total to a PowerQuery result
问题描述
I am trying to create a count of dates that occur during particular weeks (the last five weeks including this week) and a total running count for dates which would include records prior to the listed five week duration.
I could do this fairly easily using standard Excel formula, but I'm trying to learn PowerQuery and the M
language.
My first query returns the relevant rows from the original source data (another query that pulls everything from the Excel table and forces the correct data type for each column).
The dates I'm trying to count are contained in the Section A#(lf)Turnbacks.
The query is called Triage_Turnbacks.
let
Source = #"Source_Data",
#"Filter_Rows" = Table.SelectRows(Source, each ([#"Cancelled?"] = null) and ([#"Section A#(lf)Turnbacks"] <> null)),
#"Removed_Columns" = Table.SelectColumns(Filter_Rows,{"VENDOR CODE", "Supplier Lookup", "Section A#(lf)Turnbacks"}),
#"Duplicate_Date_Column" = Table.DuplicateColumn(Removed_Columns, "Section A#(lf)Turnbacks", "Start_Of_Week"),
#"Start_of_Week" = Table.TransformColumns(#"Duplicate_Date_Column",{{"Start_Of_Week", Date.StartOfWeek, type date}})
in
Start_of_Week
| VENDOR CODE | Supplier Lookup | Section ATurnbacks | Start_Of_Week |
|-------------|-----------------|--------------------|---------------|
| A1 | Supplier A | 08/04/2019 | 08/04/2019 |
| A1 | Supplier A | 08/04/2019 | 08/04/2019 |
| A2 | Supplier B | 16/04/2019 | 15/04/2019 |
| A3 | Supplier C | 30/05/2019 | 27/05/2019 |
| A10 | Supplier D | 24/04/2019 | 22/04/2019 |
| A5 | Supplier E | 18/06/2019 | 17/06/2019 |
| A1 | Supplier A | 17/07/2019 | 15/07/2019 |
| A1 | Supplier A | 24/06/2019 | 24/06/2019 |
| A2 | Supplier B | 11/06/2019 | 10/06/2019 |
| A3 | Supplier C | 03/07/2019 | 01/07/2019 |
| A10 | Supplier D | 02/07/2019 | 01/07/2019 |
| A5 | Supplier E | 17/07/2019 | 15/07/2019 |
| A1 | Supplier A | 17/07/2019 | 15/07/2019 |
| A1 | Supplier A | 05/08/2019 | 05/08/2019 |
| A2 | Supplier B | 12/07/2019 | 08/07/2019 |
| A1 | Supplier A | 05/08/2019 | 05/08/2019 |
| A1 | Supplier A | 29/07/2019 | 29/07/2019 |
| A2 | Supplier B | 12/06/2019 | 10/06/2019 |
| A3 | Supplier C | 21/02/2019 | 18/02/2019 |
| A10 | Supplier D | 23/04/2019 | 22/04/2019 |
| A5 | Supplier E | 14/06/2019 | 10/06/2019 |
| A10 | Supplier D | 08/02/2019 | 04/02/2019 |
A second query returns the previous five week Monday and Sunday dates.
The query is called 5_Week_Calendar.
let
Source = Table.FromList({-5..0}, each{_}),
AddedStartOfWeek = Table.AddColumn(Source, "StartOfWeek", each Date.StartOfWeek(Date.AddDays(Date.From(DateTime.LocalNow()),[Column1]*7)), type date),
AddedEndOfWeek = Table.AddColumn(AddedStartOfWeek, "EndOfWeek", each Date.EndOfWeek([StartOfWeek]), type date),
RemovedColumn = Table.RemoveColumns(AddedEndOfWeek,{"Column1"})
in RemovedColumn
| StartOfWeek | EndOfWeek |
|-------------|------------|
| 01/07/2019 | 07/07/2019 |
| 08/07/2019 | 14/07/2019 |
| 15/07/2019 | 21/07/2019 |
| 22/07/2019 | 28/07/2019 |
| 29/07/2019 | 04/08/2019 |
| 05/08/2019 | 11/08/2019 |
My third query returns the count per week (i.e. which records in Triage_Turnbacks.Start_Of_Week matches 5_Week_Calendar.StartOfWeek.
let
Source = Table.NestedJoin(#"5_Week_Calendar", {"StartOfWeek"}, Triage_Turnbacks, {"Start_Of_Week"}, "Triage_Turnbacks", JoinKind.LeftOuter),
#"Removed Columns" = Table.RemoveColumns(Source,{"EndOfWeek"}),
#"Aggregated Triage_Turnbacks" = Table.AggregateTableColumn(#"Removed Columns", "Triage_Turnbacks", {{"Start_Of_Week", List.Count, "Total Turnbacks"}})
in
#"Aggregated Triage_Turnbacks"
| StartOfWeek | Total Turnbacks |
|-------------|-----------------|
| 01/07/2019 | 2 |
| 08/07/2019 | 1 |
| 15/07/2019 | 3 |
| 22/07/2019 | 1 |
| 29/07/2019 | 1 |
| 05/08/2019 | 2 |
The problem
How do I add the running total which includes records before the initial 5 week date of 01/07/2019?
| StartOfWeek | Total Turnbacks | Running Total |
|-------------|-----------------|---------------|
| 01/07/2019 | 2 | 15 |
| 08/07/2019 | 1 | 16 |
| 15/07/2019 | 3 | 19 |
| 22/07/2019 | 1 | 19 |
| 29/07/2019 | 1 | 20 |
| 05/08/2019 | 2 | 22 |
解决方案
运行总计在概念上很简单,但据我所知,M
的标准库目前不提供类似List.CumulativeSum
或Table.CumulativeSum
. 这让事情变得有点尴尬,并且可能需要一个自定义函数。
运行总计可以通过以下方式实现:
ListCumulativeSum = (listOfNumbers as list) as list => List.Accumulate(listOfNumbers, {}, (listState, currentItem) => listState & {List.Last(listState, 0) + currentItem}),
其中,给定一个数字列表,返回一个数字列表。
表的等价物可能是下面的函数。(我相信有很多方法可以做到这一点;有些比其他的性能/可扩展性更高。)
TableCumulativeSum = (someTable as table, columnToSum as text, newColumnName as text) =>
let
runningTotal = ListCumulativeSum(Table.Column(someTable, columnToSum)),
rightTable = Table.FromColumns({{0..List.Count(runningTotal)-1}, runningTotal}, {"$rightJoinKey", newColumnName}),
leftTable = Table.AddIndexColumn(someTable, "$leftJoinKey"),
joined = Table.Join(leftTable, {"$leftJoinKey"}, rightTable, {"$rightJoinKey"}, JoinKind.FullOuter),
dropped = Table.RemoveColumns(joined, {"$leftJoinKey", "$rightJoinKey"})
in dropped,
总的来说,回到你的问题,我认为可以通过下面的代码来实现:
let
// This is just a function which returns a record. The record itself
// contains two values:
// • a list of dates (which will be the "left join column")
// • a replacer function (which conditionally replaces dates)
GetReplacerAndDates = (n as number, replaceWith as any) as record =>
let
startOfCurrentWeek = Date.StartOfWeek(DateTime.Date(DateTime.LocalNow())),
nMondaysAgo = Date.AddWeeks(startOfCurrentWeek, -n),
defaultAndLastNWeeks = {replaceWith} & List.Dates(nMondaysAgo, n + 1, #duration(7, 0, 0, 0)),
conditionalReplacer = (someDate as date) =>
let
startOfWeek = Date.StartOfWeek(someDate),
startOfWeekOrReplacement = if startOfWeek >= nMondaysAgo then startOfWeek else replaceWith
in startOfWeekOrReplacement,
toReturn = [replacer = conditionalReplacer, values = defaultAndLastNWeeks]
in toReturn,
rec = GetReplacerAndDates(5, "Prior period"), // Needs a better variable name.
// You don't need this below, I only used it to give me a starting point
// and so that I could verify my answer.
triageTurnbacks = Table.FromColumns(
{
{"A1","A1","A2","A3","A10","A5","A1","A1","A2","A3","A10","A5","A1","A1","A2","A1","A1","A2","A3","A10","A5","A10"},
{"Supplier A","Supplier A","Supplier B","Supplier C","Supplier D","Supplier E","Supplier A","Supplier A","Supplier B","Supplier C","Supplier D","Supplier E","Supplier A","Supplier A","Supplier B","Supplier A","Supplier A","Supplier B","Supplier C","Supplier D","Supplier E","Supplier D"},
{#date(2019,4,8), #date(2019,4,8), #date(2019,4,16), #date(2019,5,30), #date(2019,4,24), #date(2019,6,18), #date(2019,7,17), #date(2019,6,24), #date(2019,6,11), #date(2019,7,3), #date(2019,7,2), #date(2019,7,17), #date(2019,7,17), #date(2019,8,5), #date(2019,7,12), #date(2019,8,5), #date(2019,7,29), #date(2019,6,12), #date(2019,2,21), #date(2019,4,23), #date(2019,6,14), #date(2019,2,8)}
},
type table [Vendor Code = text, Supplier Lookup = text, Section A Turnbacks = date]
),
// I think the single step "startOfWeekColumn" below should probably replace the #"Duplicate_Date_Column"
// and #"Start_of_Week" steps of your Triage_Turnbacks query.
// You could also assign rec[replacer] to a variable,
// just to make the code/call site easier to read.
startOfWeekColumn = Table.AddColumn(triageTurnbacks, "startOfWeek", each rec[replacer]([Section A Turnbacks]), type any),
datesToJoin = Table.FromColumns({rec[values]}, {"startOfWeek"}),
joined = Table.NestedJoin(datesToJoin, "startOfWeek", startOfWeekColumn, "startOfWeek", "Total Turnbacks", JoinKind.LeftOuter),
reduced = Table.TransformColumns(joined, {{"Total Turnbacks", Table.RowCount, type number}}),
// Helper functions. I don't think cumulative summation can be achieved natively (as far as I know).
ListCumulativeSum = (listOfNumbers as list) as list => List.Accumulate(listOfNumbers, {}, (listState, currentItem) => listState & {List.Last(listState, 0) + currentItem}),
TableCumulativeSum = (someTable as table, columnToSum as text, newColumnName as text) as table =>
let
runningTotal = ListCumulativeSum(Table.Column(someTable, columnToSum)),
rightTable = Table.FromColumns({{0..List.Count(runningTotal)-1}, runningTotal}, {"$rightJoinKey", newColumnName}),
leftTable = Table.AddIndexColumn(someTable, "$leftJoinKey"),
joined = Table.Join(leftTable, {"$leftJoinKey"}, rightTable, {"$rightJoinKey"}, JoinKind.FullOuter),
dropped = Table.RemoveColumns(joined, {"$leftJoinKey", "$rightJoinKey"}),
changedTypes = Table.TransformColumnTypes(dropped, {{newColumnName, type number}})
in changedTypes,
runningTotal = TableCumulativeSum(reduced, "Total Turnbacks", "Running Total")
in
runningTotal
这给了我这个:
我认为这与您的预期输出相匹配,但Total Turnbacks
开始的一周的列除外22/07/2019
(您拥有1
和我拥有的列0
)。
您显然可以"Prior period"
从最终表格中过滤/删除该行。有关详细信息,请参阅代码中的注释。如果他们没有帮助,请告诉我。
推荐阅读
- .net - 将看起来像日期的 DateTime 或 String 类型作为参数传递给 URL / (.net webapi)
- python - 创建 boto3 sts 客户端时出现无效的标头值错误
- blueprint - 在 Popover2 中使用表单输入
- firebase - 即使对经过身份验证的用户使用最简单的规则,Firebase 实时数据库权限也会被拒绝
- r - 仍然无法将 MP4 嵌入闪亮
- sql-server - 以编程方式更改 PowerBI 中的订阅
- api - express-api 从 localhost 获取 /inform 帖子
- java - 如何解决构造函数上的错误?
- reactjs - 如何结合多重 apollo 查询/突变和其他 api 调用?
- python - 如何从 Rstudio 调用需要命令行参数(非静态)的 python 脚本?