首页 > 解决方案 > 在excel或Power BI中使用“分隔符分隔”功能后,如何解决列中数据不一致和杂乱无章的问题?

问题描述

!!这是我对堆栈溢出的第一个问题,所以我提前为任何模棱两可的陈述道歉!

问题:由于缺少信息,每列中的数据不一致且无组织input data

我将使用的术语:

问题


“我正在 Power BI 中处理数据集以正确构建它”

Power BI中Dataset的输入数据
--查看Power BI中Dataset的输入数据--

Power BI中Dataset的输出数据
--查看Power BI中Dataset的输出数据--

“正如您在图片中看到的那样,我有一列以上述格式(标题:信息|标题:信息|标题:信息)将多个信息组合在一起(自动生成系统的结果)。我已经分离了这些数据在我的情况下使用分隔符“|”。但由于原始输入数据中缺少一对(标题:信息),最终在单独的列中出现杂乱无章的数据“

真正的问题


“每个列现在都有应该属于另一个列的值。这是因为输入数据的每个单元格中缺少信息(标题:信息)对。因此,多个单元格跳到下一列(标题:信息)对导致一列充满异构(标题:信息)对“

例如: -

参考代码


M Language

    let
        #"Split Column by Delimiter" = Table.SplitColumn(#"Reordered Columns1", 
        "Product Details", Splitter.SplitTextByDelimiter("|",QuoteStyle.Csv),
        {"Product Details.1", "Product Details.2", "Product Details.3",
        "Product Details.4", "Product Details.5", "Product Details.6", 
        "Product Details.7", "Product Details.8", "Product Details.9", 
        "Product Details.10", "Product Details.11", "Product Details.12", 
        "Product Details.13", "Product Details.14", "Product Details.15", 
        "Product Details.16", "Product Details.17", "Product Details.18", 
        "Product Details.19", "Product Details.20"})
    in
        #"Split Column by Delimiter"

请求解决方案

预期结果

笔记:

样本:

根据@Mr. 的要求。罗恩,我无法以 csv、xlsx、txt 格式提供任何适当的 SAMPLE 文件,因为 Stack Overflow 不允许这样做,但我会尝试用一个简单的参考来解释我的问题

期望:每个(标题:信息)数据应该在整个列中同步

|    Header-1    |    Header-2    |    Header-3    |    Header-4    |    Header-5    |
|    Name:abc    |    SKU:1234    |    order:a1    |   invoice:1a   |   Shipment:0   |
|    Name:eef    |    SKU:5678    |    order:b2    |   invoice:2b   |   Shipment:1   |
|    Name:ghi    |    SKU:1256    |    order:c3    |   invoice:3c   |   Shipment:0   |
|    Name:jkl    |    SKU:3478    |    order:d4    |   invoice:4d   |   Shipment:1   |

现实:第 3、第 4 和第 5 列 (title:info) 的数据在整个 Column 中不一致

|    Header-1    |    Header-2    |    Header-3    |    Header-4    |    Header-5    |
|    Name:abc    |    SKU:1234    |    order:a1    |   Shipment:0   |  available:N0  |
|    Name:eef    |    SKU:5678    |    order:b2    |   invoice:2b   |   Shipment:1   |
|    Name:ghi    |    SKU:1256    |  available:N0  |   price:2344   | Discount:0.02% |
|    Name:jkl    |    SKU:3478    |    order:d4    |   invoice:4d   |   Shipment:1   |

我希望现在,它会很清楚

标签: pythonexcelpowerbidata-sciencedata-modeling

解决方案


不确定您到底想要什么结果,但要根据您发布的数据创建,

  • 表头是数据类型的表
  • 假设每一行代表一个特定的产品

如果这不是你想要的,请澄清。

你可以:

  • 拆分管道上的数据
  • 删除空列
  • 添加一列来表示行号
  • 取消旋转其他列
  • 拆分冒号上的值列
  • 在选择“不聚合”的值列上进行旋转

M 代码
阅读注释以更好地理解
第 2 行中的算法更改表名称

let

//just preprocessing to get from what you posted
//  to pipe-separated table
    Source = Excel.CurrentWorkbook(){[Name="Table25"]}[Content],
    #"Changed Type" = Table.TransformColumnTypes(Source,{{"|    Header-1    |    Header-2    |    Header-3    |    Header-4    |    Header-5    |", type text}}),
    #"Demoted Headers" = Table.DemoteHeaders(#"Changed Type"),
    #"Changed Type1" = Table.TransformColumnTypes(#"Demoted Headers",{{"Column1", type text}}),
    #"Split Column by Delimiter" = Table.SplitColumn(#"Changed Type1", "Column1", Splitter.SplitTextByDelimiter("|", QuoteStyle.Csv), {"Column1.1", "Column1.2", "Column1.3", "Column1.4", "Column1.5", "Column1.6", "Column1.7"}),
    #"Changed Type2" = Table.TransformColumnTypes(#"Split Column by Delimiter",{{"Column1.1", type text}, {"Column1.2", type text}, {"Column1.3", type text}, {"Column1.4", type text}, {"Column1.5", type text}, {"Column1.6", type text}, {"Column1.7", type text}}),
    #"Removed Columns" = Table.RemoveColumns(#"Changed Type2",{"Column1.1", "Column1.7"}),
    #"Trimmed Text" = Table.TransformColumns(#"Removed Columns",{{"Column1.2", Text.Trim, type text}, {"Column1.3", Text.Trim, type text}, {"Column1.4", Text.Trim, type text}, {"Column1.5", Text.Trim, type text}, {"Column1.6", Text.Trim, type text}}),
    #"Promoted Headers" = Table.PromoteHeaders(#"Trimmed Text", [PromoteAllScalars=true]),
    #"Changed Type3" = Table.TransformColumnTypes(#"Promoted Headers",{{"Header-1", type text}, {"Header-2", type text}, {"Header-3", type text}, {"Header-4", type text}, {"Header-5", type text}}),

//Add index column to retain original row numbers
    rowNums = Table.AddIndexColumn(#"Changed Type3","Row Number",0,1,Int64.Type),

//Unpivot except for rowNum column
//remove Attribue column
    #"Unpivoted Other Columns" = Table.UnpivotOtherColumns(rowNums, {"Row Number"}, "Attribute", "Value"),
    #"Removed Columns1" = Table.RemoveColumns(#"Unpivoted Other Columns",{"Attribute"}),

//split by the colon delimiter
//set data types
    #"Split Column by Delimiter1" = Table.SplitColumn(#"Removed Columns1", "Value", Splitter.SplitTextByDelimiter(":", QuoteStyle.Csv), {"Value.1", "Value.2"}),
    #"Changed Type4" = Table.TransformColumnTypes(#"Split Column by Delimiter1",{{"Value.1", type text}, {"Value.2", type text}}),

//Pivot on the Value.1 column
//Remove the row number column
    #"Pivoted Column" = Table.Pivot(#"Changed Type4", List.Distinct(#"Changed Type4"[Value.1]), "Value.1", "Value.2"),
    #"Removed Columns2" = Table.RemoveColumns(#"Pivoted Column",{"Row Number"})
in
    #"Removed Columns2"

在此处输入图像描述


推荐阅读