excel - 从包含 CSV 文件的文件夹中筛选 Power Query 中的多个条件的方法
问题描述
我需要您的帮助来纠正/建议我用来从 CSV 格式的文件夹中获取数据的查询。预先警告:我不知道,如何尽快写这个。
先说几个信息:
工具仅限于 Power Query、Excel、VBA
数据查询每月运行一次,因此较长的加载时间不是大问题,尽管较短的时间是更可取的
- 我选择了 Power Query 方法,因为源数据必须在另一个 Excel 文件中使用,但具有不同的规则集(这是我当前问题的一部分)。
- 我的代码的基本问题是它运行了很长时间,需要满足大量条件,并且出于另一个原因/工具/文件,我必须使用类似的方法。我希望人们只需按刷新即可获得所需的信息。
描述:
我在文件夹中的 CSV 文件中有数据源。命名约定不存在,因为多人从系统中导出数据。因此,我在 PQ 中使用了文件夹选项。
数据的大小目前约为 400-600 MB。列的名称可能会发生变化,这是 M 代码中要绕过的第一行。
我的主要斗争是:
有几个条件,需要实现。我不想写多个if
语句,因为代码会变得非常难看,而且条件的数量是十分之一并且跨越多个列。出于这个原因,我实现了(我们称之为 TT)转换表,其中我有所有可以使用过滤的列,并且该 TT 的最后一列是所有列的串联。如果在我不关心其中一列的情况下,我用通配符“*”填充它。
所以 TT 可能看起来像:
| PC | CLIENT | FN | TC | STRING |
|----|--------|-----|----|-------------|
| 11 | * | NEW | AC | 11*NEWAC |
| 47 | 000001 | NEW | * | 47000001NEW*|
ETC...
PC 是 PoC,FN 是 FUNCTION,TC 是交易代码(在下面的代码中)。
然后在代码中,我用 PQ 中的适当列的值替换通配符,并检查 PQ 中相同列的连接字符串是否包含在 TT 中(最后一列被制成列表)。下面的代码适用于更简单的解决方案,但它非常硬编码,因为我想知道它是否可能。
数据更新后,我运行 VBA 宏将数据附加到“数据库”表中(ofc 检查是否存在现有值),因此可以最大限度地减少数据负载。出于这个原因,使用了第一部分代码。基本上我可以将代码分为三个部分:
- 基本转换:从文件夹加载,摆脱非常规名称并检查其他文件夹是否包含相同命名的文件以最大程度地减少负载。
- 过滤数据:包括将 PQ 表与 TT 表合并,用正确的列替换通配符,然后创建过滤字符串以检查连接的 PQ 表中的文本是否包含 TT 列表中的至少一个值。
- 使用数据的最终转换以获得我需要的信息(主要是关于来自市场的后期结算)
带有注释的完整 M 代码
let
/*Here starts basic data transformation to limit errors in CSV files due to
different conventions */
Source = Folder.Files(source),
#"Uppercased Text1" = Table.TransformColumns(Source,{{"Name", Text.Upper, type text}}),
#"Merged Queries2" = Table.NestedJoin(#"Uppercased Text1", {"Name"}, q_Archive, {"Name"}, "q_Archive", JoinKind.LeftAnti),
#"Added Custom" = Table.AddColumn(#"Merged Queries2", "Data", each Csv.Document(File.Contents([Folder Path] & "\" & [Name]),[Delimiter=";", Encoding = 1252, QuoteStyle = QuoteStyle.None])),
#"Removed Other Columns" = Table.SelectColumns(#"Added Custom",{"Data"}),
#"Added Custom1" = Table.AddColumn(#"Removed Other Columns", "Table", each Table.PromoteHeaders([Data])),
#"Removed Other Columns1" = Table.SelectColumns(#"Added Custom1",{"Table"}),
#"Added Custom2" = Table.AddColumn(#"Removed Other Columns1", "Upper", each Table.TransformColumnNames([Table],Text.Upper)),
#"Removed Other Columns2" = Table.SelectColumns(#"Added Custom2",{"Upper"}),
#"Expanded Upper" = Table.ExpandTableColumn(#"Removed Other Columns2", "Upper", {"19A AMOUNT", "19A CURRENCY CODE", "35B ISIN", "CLIENT", "EXP.SETTL.DATE", "FUNCTION", "INSTR.ID", "MESSAGE FUNCTION", "POC", "RECEPTION DATE", "SETTL.AMOUNT", "SETTL.CUR.", "TRANSACTION CODE"}, {"19A AMOUNT", "19A CURRENCY CODE", "35B ISIN", "CLIENT", "EXP.SETTL.DATE", "FUNCTION", "INSTR.ID", "MESSAGE FUNCTION", "POC", "RECEPTION DATE", "SETTL.AMOUNT", "SETTL.CUR.", "TRANSACTION CODE"}),
#"Renamed Columns1" = Table.RenameColumns(#"Expanded Upper",{{"SETTL.AMOUNT", "SETTL.AMOUNT2"}, {"SETTL.CUR.", "SETTL.CUR.2"}, {"19A CURRENCY CODE", "19A CURRENCY CODE2"}, {"19A AMOUNT", "19A AMOUNT2"}}),
#"Added Custom10" = Table.AddColumn(#"Renamed Columns1", "19A AMOUNT", each if[SETTL.AMOUNT2]=null then [19A AMOUNT2] else [SETTL.AMOUNT2]),
#"Added Custom11" = Table.AddColumn(#"Added Custom10", "19A CURRENCY CODE", each if [SETTL.CUR.2] = null then [19A CURRENCY CODE2] else [SETTL.CUR.2]),
#"Renamed Columns" = Table.RenameColumns(#"Added Custom11",{{"FUNCTION", "FUNCTION2"}}),
#"Added Custom8" = Table.AddColumn(#"Renamed Columns", "FUNCTION", each if[FUNCTION2]=null then [MESSAGE FUNCTION] else[FUNCTION2]),
#"Removed Other Columns3" = Table.SelectColumns(#"Added Custom8",{"35B ISIN", "CLIENT", "EXP.SETTL.DATE", "INSTR.ID", "POC", "RECEPTION DATE", "TRANSACTION CODE", "19A AMOUNT", "19A CURRENCY CODE", "FUNCTION"}),
#"Reordered Columns" = Table.ReorderColumns(#"Removed Other Columns3",{"POC", "CLIENT", "FUNCTION", "TRANSACTION CODE", "EXP.SETTL.DATE", "RECEPTION DATE", "19A AMOUNT", "19A CURRENCY CODE"}),
#"Replaced Value" = Table.ReplaceValue(#"Reordered Columns","""","",Replacer.ReplaceText,{"POC", "CLIENT", "INSTR.ID", "35B ISIN"}),
#"Replaced Value1" = Table.ReplaceValue(#"Replaced Value","=","",Replacer.ReplaceText,{"POC", "CLIENT", "INSTR.ID", "35B ISIN"}),
#"Uppercased Text" = Table.TransformColumns(#"Replaced Value1",{{"POC", Text.Upper, type text}, {"CLIENT", Text.Upper, type text}, {"FUNCTION", Text.Upper, type text}, {"TRANSACTION CODE", Text.Upper, type text}}),
#"Filtered Rows" = Table.SelectRows(#"Uppercased Text", each ([FUNCTION] = "NEWM")),
#"Merged Queries" = Table.NestedJoin(#"Filtered Rows", {"POC"}, tbl_setup_pocList, {"PocList"}, "tbl_setup_pocList", JoinKind.Inner),
#"Removed Columns" = Table.RemoveColumns(#"Merged Queries",{"tbl_setup_pocList"}),
/* Here ends the data transformation part
and the part for list transformations start*/
#"Added condition" = Table.AddColumn(#"Removed Columns","COND", each (
((Table.FromRecords({
[PC = List.ReplaceValue(Table.Column(tbl_filtering_string, "POC"),"*",[POC], Replacer.ReplaceText),
CL = List.ReplaceValue(Table.Column(tbl_filtering_string, "CLIENT"),"*",[CLIENT], Replacer.ReplaceText),
FN = List.ReplaceValue(Table.Column(tbl_filtering_string, "FUNCTION"),"*",[FUNCTION], Replacer.ReplaceText),
TC = List.ReplaceValue(Table.Column(tbl_filtering_string, "TRANSACTION CODE"),"*",[TRANSACTION CODE], Replacer.ReplaceText)]}
))))),
#"Expanded COND" = Table.ExpandTableColumn(#"Added condition", "COND", {"PC", "CL", "FN", "TC"}, {"PC", "CL", "FN", "TC"}),
#"Added Custom3" = Table.AddColumn(#"Expanded COND", "Test", each (List.Combine(
{
{_[PC]},{_[CL]},{_[FN]},{_[TC]}
}
))),
#"Expanded Test" = Table.AddColumn(#"Added Custom3", "Test2", each (Table.FromColumns(_[Test],null))),
#"Removed Columns2" = Table.RemoveColumns(#"Expanded Test",{"PC", "CL", "FN", "TC", "Test"}),
#"Added Custom4" = Table.AddColumn(#"Removed Columns2", "String", each Table.ToList([Test2],Combiner.CombineTextByDelimiter(""))),
#"Removed Columns3" = Table.RemoveColumns(#"Added Custom4",{"Test2"}),
#"Added Custom6" = Table.AddColumn(#"Removed Columns3", "CONTAIN_STR", each [POC]&[CLIENT]&[FUNCTION]&[TRANSACTION CODE]),
#"Added Custom5" = Table.AddColumn(#"Added Custom6", "Cond", each List.Contains(_[String],[CONTAIN_STR])),
#"Filtered Rows1" = Table.SelectRows(#"Added Custom5", each ([Cond] = false)),
/*Here the code for filtering ends and final transformations occur */
#"Removed Columns4" = Table.RemoveColumns(#"Filtered Rows1",{"String", "CONTAIN_STR", "Cond"}),
#"Merged Queries1" = Table.NestedJoin(#"Removed Columns4", {"POC"}, tbl_setup_exotics, {"Exotic_PoC"}, "tbl_setup_exotics", JoinKind.LeftOuter),
#"Expanded tbl_setup_exotics" = Table.ExpandTableColumn(#"Merged Queries1", "tbl_setup_exotics", {"Exotic_PoC"}, {"Exotic_PoC"}),
#"Replaced Value2" = Table.ReplaceValue(#"Expanded tbl_setup_exotics",null, "Non Exotic",Replacer.ReplaceValue,{"Exotic_PoC"}),
#"Removed Errors" = Table.RemoveRowsWithErrors(#"Replaced Value2", {"EXP.SETTL.DATE", "RECEPTION DATE"}),
#"Changed Type" = Table.TransformColumnTypes(#"Removed Errors",{{"EXP.SETTL.DATE", type date}, {"RECEPTION DATE", type date}}),
#"Added Custom7" = Table.AddColumn(#"Changed Type", "RD", each (if [Exotic_PoC] <> "Non Exotic" then Date.AddDays([RECEPTION DATE],1)else [RECEPTION DATE])),
#"Filtered Rows2" = Table.AddColumn(#"Added Custom7", "LB" , each if [RD]>=[EXP.SETTL.DATE] then "Late" else "Not"),
#"Added Custom9" = Table.AddColumn(#"Filtered Rows2", "DAYS_LATE", each [RD]-[EXP.SETTL.DATE]),
#"Inserted Year" = Table.AddColumn(#"Added Custom9", "Year", each Date.Year([EXP.SETTL.DATE]), Int64.Type),
#"Inserted Month" = Table.AddColumn(#"Inserted Year", "Month", each Date.Month([EXP.SETTL.DATE]), Int64.Type),
#"Changed Type1" = Table.TransformColumnTypes(#"Inserted Month",{{"19A AMOUNT", type number}}),
#"Grouped Rows" = Table.Group(#"Changed Type1", {"Year", "Month", "POC", "19A CURRENCY CODE", "DAYS_LATE", "LB"}, {{"Count", each Table.RowCount(_), type number}, {"Countervalue", each List.Sum([19A AMOUNT]), type text}, {"ISIN", each Text.Combine([35B ISIN],";"), type text}, {"INSTR.ID", each Text.Combine([INSTR.ID], ";"), type text}}),
#"Merged Queries3" = Table.NestedJoin(#"Grouped Rows", {"Year", "Month", "19A CURRENCY CODE"}, q_Xrates, {"Year", "Month", "Currency"}, "q_Xrates", JoinKind.LeftOuter),
#"Expanded q_Xrates" = Table.ExpandTableColumn(#"Merged Queries3", "q_Xrates", {"Rate"}, {"Rate"}),
#"Replaced Value3" = Table.ReplaceValue(#"Expanded q_Xrates",null,1,Replacer.ReplaceValue,{"Rate"}),
#"Added Col" = Table.AddColumn(#"Replaced Value3", "CV", each [Countervalue]/[Rate]),
#"Remove Countervalue" = Table.RemoveColumns(#"Added Col", {"Countervalue"})
in
#"Remove Countervalue"
问题
- 我知道这种方法听起来过于复杂,但它确实有效(不幸的是,它需要很长时间才能刷新)。但这真的好吗?考虑到开头提到的有限工具使用,没有其他选择吗?
- 我怎样才能使这段代码更好?我相信它可以部分重新制作成功能,但由于我是 PQ 的初学者,我无法想象如何。
- 如何对相同的源数据使用相同的方法,但复杂性更高?您可以将其理解为要添加到过滤字符串中的更多列。
- 您还有其他建议吗?
结束评论
- 我现在非常绝望,有时我的书面文字可能会令人困惑。
- 提供某种 Visio 图表以更图形化的方式显示我的逻辑(我对此更熟悉)以及关系概述,我没有任何问题。
- 我也没有提供匿名数据的问题(因为它可能是部分机密的)。如果您需要该服务,请参阅首选服务。
- 如果我被推向正确的方向,我不介意处理我的代码。对于那个问题。#1 是优先级。所以基本上这是一种很好的方法,它可以很容易地调整到另一个相同但更复杂的目的吗?
我真的很感谢你的时间。
*/ MK */
解决方案
如果我要这样做,我会编写一个函数,将过滤条件表编译成一个函数,然后使用 Table.SelectRows 应用它。
// Compile the condition table into a function that can be applied in row filtering.
filterCondition = compileFilterConditionTable(tbl_filtering_string),
#"Filtered Rows" = Table.SelectRows(#"Table after Preceding Steps", filterCondition)
这看起来不是更容易追踪这些步骤吗?
下面是一个将条件表编译为逻辑函数的函数的示例代码。我不确定这是否适合您的情况,因为我不完全理解要求。
compileFilterConditionTable =
let compileFilterConditionTable = (filterConditionTable as table) as function =>
let recordConditions = List.Transform(
Table.ToRecords(filterConditionTable),
compileFilterConditionRecord)
in applyCombine(recordConditions, List.AnyTrue),
compileFilterConditionRecord = (cond as record) as function =>
let fieldNameValues = List.Transform(
Record.FieldNames(cond),
each [Name = _, Value = Record.Field(cond, Name)]
),
fieldConditions = List.Transform(fieldNameValues, compileFieldCondition)
in applyCombine(fieldConditions, List.AllTrue),
compileFieldCondition = (fieldNameValue as record) as function =>
let name = fieldNameValue[Name],
value = fieldNameValue[Value]
in
if value = "*" then (record as record) as logical => true
else (record as record) as logical => Record.Field(record, name) = value,
applyCombine = (functions as list, combiner as function) as function =>
(value) => combiner(List.Transform(functions, (f) => f(value)))
in compileFilterConditionTable
无论如何,M 是一种函数式编程语言,因此以函数式的方式思考和编码它会有所帮助。把整个逻辑分解成小部分,让每个小部分都容易理解。将您的代码编写为可重用的小函数,并将它们组合起来构建整体。
推荐阅读
- reactjs - 使用 useCallback 重构 useEffect 钩子 - React
- javascript - Three.js OrbitControls 围绕环旋转
- reactjs - React:上下文挂钩,将对象推入数组
- keras - 为什么 CuDNNLSTM 在 keras 中比 LSTM 有更多的参数?
- regex - 0 到 299 之间的字符串的正则表达式,后跟 x 和 y 的某种组合
- python - 如何模拟请求中的按钮单击?
- android - 如何获取具有这种格式的 URI content://xx.xxx.xxx.file_provider/file%253A%2...等
- python - ElementNotInteractableException:将文本发送到电子邮件字段的元素不可交互错误
- jquery - jQuery Ajax GET 请求在某些情况下返回 500 错误
- reactjs - React Private Route 和 hooks context 和 reducer