首页 > 解决方案 > Power Query 组合三个外部 Excel 源文件并附加特定列

问题描述

我正在尝试创建一个结合我的 3 个源文件主键列的查找表,这样我就不必进行外部连接来查找每个源中丢失的记录,然后将它们附加在一起。我找到了如何“组合”两个源文件,但我不知道如何钻取列/字段列表,以便我只能选择第 1 列(或 Excel 文件中的“项目代码”标题名称)。

这是我迄今为止合并 2/3 个文件的代码(作为试用版):

let
Source = Table.Combine({Excel.Workbook(File.Contents("C:\Users\Desktop\Dry Good Demad-Supply Report\MRP_ParentDmd\Data_Sources\JDE_MRP_Dmd.xlsx"), null, true), 

Excel.Workbook(File.Contents("C:\Users\Desktop\Dry Good Demad-Supply Report\MRP_ParentDmd\Data_Sources\JDE_Open_PO.xlsx"), null, true)})

in Source

标签: powerbipowerquerym

解决方案


如果您有一个不太理想的数据源(即大量不相关的列,您想要的数据中的重复项),那么避免实现一大堆不必要数据的一种方法是对嵌套表格单元格执行所有转换/过滤而不是加载所有数据只是为了删除列/重复。

下面的 M 代码应该是一个粗略的开始,希望能让你上路

let
    //Adjust the Source step to refer to the relevant folder your 3 source files are saved in
    Source = Folder.Files("CC:\Users\Desktop\Dry Good Demad-Supply Report\MRP_ParentDmd\Data_Sources"),

    //Filter the file list to leave just your 3 source files if required
    #"Filtered Rows" = Table.SelectRows(Source, each ([Extension] = ".xlsx")),

    //Remove all columns excep the Binary file column
    #"Removed Other Columns" = Table.SelectColumns(#"Filtered Rows",{"Content"}),

    //Convert the binary file to the file data ie sheets, tables, named ranges etc - the same data you get when you use a file as a source
    #"Workbook Data" = Table.TransformColumns(#"Removed Other Columns",{"Content", each Excel.Workbook(_)}), 

    //Filter the nested file data table cell to select the sheet you need from your source files - may not be necessary depending on what's in the files
    #"Sheet Filter" = Table.TransformColumns(#"Workbook Data",{"Content", each Table.SelectRows(_, each [Name] = "Sheet1")}),     

    //Step to Name the column you want to extract data from
    #"Column Name" = "Column1",

    //Extract a List of the values in the specified column
    #"Column Values" = Table.TransformColumns(#"Sheet Filter",{"Content", each List.Distinct(Table.Column(_{0}[Data],#"Column Name"))}), 

    //Expand all the lists
    #"Expanded Content" = Table.ExpandListColumn(#"Column Values", "Content"),

    #"Removed Duplicates" = Table.Distinct(#"Expanded Content")
in
    #"Removed Duplicates"

编辑 要选择多列并提供不同的行,您可以更改从#"Column Name"

根据您拥有的数据量,这最终可能比上一步花费更长的时间,但它应该可以完成这项工作

    //Step to Name the column you want to extract data from
    #"Column Name" = {"Column1","Column2","Column5"},

    //Extract a List of the values in the specified column
    #"Column Values" = Table.TransformColumns(#"Sheet Filter",{"Content", each Table.SelectColumns(_{0}[Data],#"Column Name")}),

    //In each nested table, filter down to distinct rows
    #"Distinct rows in Nested Tables" = Table.TransformColumns(#"Column Values",{"Content", each Table.Distinct(_)}),

    //Expand nested table column
    #"Expanded Content" = Table.ExpandTableColumn(#"Distinct rows in Nested Tables", "Content", #"Column Name"), 

    //Remove Duplicates in combined table
    #"Removed Duplicates" = Table.Distinct(#"Expanded Content")
in
    #"Removed Duplicates"

推荐阅读