powerbi - Power Query 组合三个外部 Excel 源文件并附加特定列
问题描述
我正在尝试创建一个结合我的 3 个源文件主键列的查找表,这样我就不必进行外部连接来查找每个源中丢失的记录,然后将它们附加在一起。我找到了如何“组合”两个源文件,但我不知道如何钻取列/字段列表,以便我只能选择第 1 列(或 Excel 文件中的“项目代码”标题名称)。
这是我迄今为止合并 2/3 个文件的代码(作为试用版):
let
Source = Table.Combine({Excel.Workbook(File.Contents("C:\Users\Desktop\Dry Good Demad-Supply Report\MRP_ParentDmd\Data_Sources\JDE_MRP_Dmd.xlsx"), null, true),
Excel.Workbook(File.Contents("C:\Users\Desktop\Dry Good Demad-Supply Report\MRP_ParentDmd\Data_Sources\JDE_Open_PO.xlsx"), null, true)})
in Source
解决方案
如果您有一个不太理想的数据源(即大量不相关的列,您想要的数据中的重复项),那么避免实现一大堆不必要数据的一种方法是对嵌套表格单元格执行所有转换/过滤而不是加载所有数据只是为了删除列/重复。
下面的 M 代码应该是一个粗略的开始,希望能让你上路
let
//Adjust the Source step to refer to the relevant folder your 3 source files are saved in
Source = Folder.Files("CC:\Users\Desktop\Dry Good Demad-Supply Report\MRP_ParentDmd\Data_Sources"),
//Filter the file list to leave just your 3 source files if required
#"Filtered Rows" = Table.SelectRows(Source, each ([Extension] = ".xlsx")),
//Remove all columns excep the Binary file column
#"Removed Other Columns" = Table.SelectColumns(#"Filtered Rows",{"Content"}),
//Convert the binary file to the file data ie sheets, tables, named ranges etc - the same data you get when you use a file as a source
#"Workbook Data" = Table.TransformColumns(#"Removed Other Columns",{"Content", each Excel.Workbook(_)}),
//Filter the nested file data table cell to select the sheet you need from your source files - may not be necessary depending on what's in the files
#"Sheet Filter" = Table.TransformColumns(#"Workbook Data",{"Content", each Table.SelectRows(_, each [Name] = "Sheet1")}),
//Step to Name the column you want to extract data from
#"Column Name" = "Column1",
//Extract a List of the values in the specified column
#"Column Values" = Table.TransformColumns(#"Sheet Filter",{"Content", each List.Distinct(Table.Column(_{0}[Data],#"Column Name"))}),
//Expand all the lists
#"Expanded Content" = Table.ExpandListColumn(#"Column Values", "Content"),
#"Removed Duplicates" = Table.Distinct(#"Expanded Content")
in
#"Removed Duplicates"
编辑
要选择多列并提供不同的行,您可以更改从#"Column Name"
根据您拥有的数据量,这最终可能比上一步花费更长的时间,但它应该可以完成这项工作
//Step to Name the column you want to extract data from
#"Column Name" = {"Column1","Column2","Column5"},
//Extract a List of the values in the specified column
#"Column Values" = Table.TransformColumns(#"Sheet Filter",{"Content", each Table.SelectColumns(_{0}[Data],#"Column Name")}),
//In each nested table, filter down to distinct rows
#"Distinct rows in Nested Tables" = Table.TransformColumns(#"Column Values",{"Content", each Table.Distinct(_)}),
//Expand nested table column
#"Expanded Content" = Table.ExpandTableColumn(#"Distinct rows in Nested Tables", "Content", #"Column Name"),
//Remove Duplicates in combined table
#"Removed Duplicates" = Table.Distinct(#"Expanded Content")
in
#"Removed Duplicates"
推荐阅读
- r - 从文本字符串中获取字符串的唯一计数
- dynamics-crm - Teams 与 Microsoft Dynamics CRM 365 的集成
- php - php 获取总和数组,但有“通知”未定义的偏移量:0
- bash - rsync 将内容从 joe@remote:/foo/bar/baz 复制到 b 而不创建 b/foo/bar/a
- python - 如何将 Google Cloud Firestore 本地模拟器用于 python 和测试目的
- css - CSS段落从下到上
- javascript - 如何取消重音符号的onkeydown
- c++ - 如果与以前没有不同,则谷歌记录单个输出
- elasticsearch - 将 index.max_result_window 增加到 10000000。有问题吗?
- centos7 - 在 CentOS 7 上使用 Letsencrypt 的 Net Core 2.2