首页 > 解决方案 > 如何使用字典对象从工作表中的重复标题中查找和组合数据

问题描述

我正在尝试获取第一张图像中存在的数据(带有标记),将其放入字典对象(我刚刚了解),查找重复的标题(在本例中为“P8”条目),然后获取重复项并将它们与该标题的第一次出现组合,然后删除与重复标题关联的部分。第二张图片(无标记)是说完所有数据后的外观。请注意,“pinlables:[] 现在有多个数据实例组合在重复项中组合成一个实例。

这是我设法拼凑起来的代码(无论如何我都不是程序员,我写的最后一个 VBA 程序是 5 年前写的,我花了很长时间,我只是被这个任务卡住了,因为据我所知,这是我们小团队中最多的)我知道它缺少一些关键元素,例如正确加载密钥,那是因为我不太明白如何从我读过的文章和代码中做到这一点. 我知道一般组织步骤我只是有点迷失如何使用字典对象并使其与正确的循环一起工作。所以我试图在缺失的部分发表评论,以确定我认为需要发生的事情。还可能值得注意的是,此表中的数据具有非常特定的空格、逗号、括号等格式,因为我的最终输出是 . yml 输入文件,该文件输入另一个程序。因此,如果我可以保留格式,那就太好了。

    Sub AltDictSort()

Dim Rng As Range
Dim Dn As Range
Dim n As Long
Dim nRng As Range
Dim tempDN As String
Dim TxtRng As Range

Set Rng = Range(Range("A1"), Range("A" & Rows.Count).End(xlUp))

With CreateObject("scripting.dictionary")
.CompareMode = vbTextCompare
For Each Dn In Rng
    If .Exists(Dn.Value) Then
         
    'not sure this next line does what I'm intending
    tempDN = .Item(Dn.Value).Offset(2, 0) 'load Dn.Value into temp value should be something like "   pinlabels: [J2-1,J2-2,J2-3]"
    
    Dn.Value = Left(tempDN, Len(tempDN) - 15) 'Strip 15 characters from left to get "J2-1,J2-2,J2-3]"
    tempDN = Dn.Value
    Dn.Value = Right(tempDN, Len(tempDN) - 1) 'Strip 1 characters from right to get "J2-1,J2-2,J2-3"
    tempDN = (Dn.Value + "," + Dn) 'add the two strings together to get something like this "   pinlabels: [J2-1,J2-2,J2-3,J-4,J-5,J-6]"
    
    'now I need to put the combined string back into the spot of the first occurrence of a pinlabels duplicate (in this specific case A8) but need to identify location of first occurrence
    
    'now I need to delete the entire second occurrence ( second P8: and next two rows with mpn and pinlabels) no idea how to do this
    
    Else
    'I don't think anything needs to happen here but I'm not completely sure????
    End If
Next

End With
End Sub

@JohnnieL 这是输入数据看起来像文本的样子,尽管它在发布时似乎丢失了格式。

> connectors:   Startup-R-J2:    mpn: 436450310    pinlabels:
> [J2-1,J2-2,J2-3]
> 
>   P8:    mpn: D38999/20JE26PN    pinlabels: [P8-C,P8-D,P8-E]
> 
>   Startup-R-J1:    mpn: 436450310    pinlabels:
> [J1-4,J1-9,J1-3,J1-6,J1-7]
> 
>   P8:    mpn: D38999/20JE26PN    pinlabels: [P8-G,P8-H,P8-I,P8-J,P8-K]
> 
>   Startup-R-J3:    mpn: 170-009-272L000    pinlabels: [J3-3,J3-2,J3-1]
> 
>   P8:    mpn: D38999/20JE26PN    pinlabels: [P8-R,P8-S,P8-T]
> 
>   PTO1-J2:    mpn: 170-009-272L000    pinlabels: [J2-5,J2-6]
> 
>   P8:    mpn: D38999/20JE26PN    pinlabels: [P8-A,P8-B]
> 
>   PTO3-J2:    mpn: 170-009-272L000    pinlabels: [J2-8,J2-7]
> 
>   P8:    mpn: D38999/20JE26PN    pinlabels: [P8-N,P8-P]
> 
>   PTO3-J2:    mpn: 170-009-272L000    pinlabels: [J2-3,J2-4]
> 
>   P8:    mpn: D38999/20JE26PN    pinlabels: [P8-R,P8-S]
> 
> cables:   Startup-R-J2_P8:    wirecount: 3    gauge: 20 AWG    length:
> 100 mm    color_code: IEC
> 
>   Startup-R-J1_P8:    wirecount: 5    gauge: 22 AWG    length: 200 mm 
> color_code: IEC
> 
>   Startup-R-J3_P8:    wirecount: 3    gauge: 24 AWG    length: 300 mm 
> color_code: IEC
> 
>   PTO1-J2_P8:    wirecount: 2    gauge: 26 AWG    length: 400 mm   
> color_code: IEC
> 
>   PTO3-J2_P8:    wirecount: 2    gauge: 28 AWG    length: 500 mm   
> color_code: IEC
> 
>   PTO3-J2_P8:    wirecount: 2    gauge: 30 AWG    length: 600 mm   
> color_code: IEC
> 
> 
> connections:
> -
>   - Startup-R-J2: [J2-1,J2-2,J2-3]
>   - Startup-R-J2_P8: [1-3]
>   - P8: [P8-C,P8-D,P8-E]
> -
>   - Startup-R-J1: [J1-4,J1-9,J1-3,J1-6,J1-7]
>   - Startup-R-J1_P8: [1-5]
>   - P8: [P8-G,P8-H,P8-I,P8-J,P8-K]
> -
>   - Startup-R-J3: [J3-3,J3-2,J3-1]
>   - Startup-R-J3_P8: [1-3]
>   - P8: [P8-R,P8-S,P8-T]
> -
>   - PTO1-J2: [J2-5,J2-6]
>   - PTO1-J2_P8: [1-2]
>   - P8: [P8-A,P8-B]
> -
>   - PTO3-J2: [J2-8,J2-7]
>   - PTO3-J2_P8: [1-2]
>   - P8: [P8-N,P8-P]
> -
>   - PTO3-J2: [J2-3,J2-4]
>   - PTO3-J2_P8: [1-2]
>   - P8: [P8-R,P8-S]

标签: excelvbadictionaryduplicates

解决方案


我将从一个可能看起来像这样的类模块开始 - 我们ConnectorInfo现在称之为:

Option Explicit
Public ConnectorID As String
Public MPN As String
Public PinLabels As New Collection

这个想法是对我们正在查看的数据进行建模;输出中表示的每个“对象”都有一个“ConnectorID”值(“P8”、“Startup-R-J1”、“PTO3-J2”等)、一个 MPN 值(“436450310”、“170-009- 272L000" 等),以及许多需要组合的引脚标签,因此需要有代码可以将这个PinLabels集合变成一个字符串,用逗号分隔它们并用方括号将列表包裹起来。

因此,让我们向该类模块添加一个公共函数,它通过将集合复制到一个数组中来完成此操作,然后使用该VBA.Strings.Join函数生成引脚标签列表:

Public Function CombinePinLabels() As String
    ReDim result(1 To PinLabels.Count) As String
    Dim i As Long
    For i = 1 To PinLabels.Count
        result(i) = PinLabels(i)
    Next
    CombinePinLabels = "[" & Join(result, ",") & "]"
End Function

由于输入将读取 PinLabels 作为字符串,我们需要一个程序(因为我们在一个类模块中,我们可以将其称为“方法”)为我们拼接它们,同时确保没有任何标签重复;我们可以通过键入集合项来做到这一点(不需要字典,因为我们实际上并没有访问键):

Public Sub ParsePinLabels(ByVal inputValue As String)
    'expect inputValue to look like "[123,456,ABC-123,XYZ-000-ABC]"; assert that (i.e. break here before we make a mess):
    Debug.Assert Left$(inputValue, 1) = "["
    Debug.Assert Right$(inputValue, 1) = "]"
    
    'strip the prefix and brackets:
    Dim parsed As String
    parsed = Mid$(inputValue, 2, Len(inputValue - 2))

    Dim values As Variant
    values = Strings.Split(parsed, ",")

    Dim i As Long
    For i = LBound(values) To UBound(values)
        On Error Resume Next 'prevent blowing up when key already exists
        PinLabels.Add values(i), values(i)
        On Error GoTo 0 'important!
    Next
End Sub

请注意,输入逻辑和格式在这里基本上是不相关的:需要进行的处理独立于输入格式输出格式。

那么让我们构建输出。

[...] 我的最终输出是一个输入另一个程序的 .yml 输入文件。

绝对折腾操纵Excel对象的想法:你想要的是让你的代码生成一个.yml文本文件。

处理输入的代码将为ConnectorInfo产生输出的代码提供一组对象,所以我们已经知道我们需要一个过程。在标准模块(例如Module1)中,您希望有这样的过程:

Public Sub GenerateOutputYML(ByVal connectors As Collection)
    Dim connector As ConnectorInfo
    For Each connector In connectors
       'TODO
    Next
End Sub

但是,我们需要它输出到一个特定的文件名——让我们把它作为一个参数来考虑我们以后如何提供它:

Public Sub GenerateOutputYML(ByVal filePath As String, ByVal connectors As Collection)
    Dim handle As Long
    handle = VBA.FreeFile

    On Error GoTo CleanFail 'MUST handle errors when dealing with filesystem I/O
    Open filePath For Output As #handle
    Print #handle, "connectors:"

    'use ForEach..Next loops to iterate object collections
    Dim connector As ConnectorInfo
    For Each connector In connectors
       'each Print # statement writes a line to the text file,
       'Spc() function writes the number of specified spaces to control indentation.
       Print #handle, Spc(2) & connector.ConnectorID & ":"
       Print #handle, Spc(4) & "mpn: " & connector.MPN
       Print #handle, Spc(4) & "pinlabels: " & connector.CombinePinLabels
       Print #handle 'leaves an empty line between connectors
    Next

CleanExit:
    Close #handle
    Exit Sub
CleanFail:
    MsgBox Err.Description
    Resume CleanExit
End Sub

现在剩下要做的就是将输入解析为这样CollectionConnectorInfo对象。您可以通过在 Excel 中打开文本文件然后迭代单元格来执行此操作 - 或者您可以使用类似的Open语句以编程方式在内存中打开文本文件,并且可以存在于接受文件名并返回的函数中输出函数想要使用的集合:

Public Function ParseInput(ByVal intputFilePath As String) As Collection

    Dim handle As Long
    handle = VBA.FreeFile 'gets an available file handle
 
    On Error GoTo CleanFail
    Open inputFilePath For Input As #handle 'never hard-code the handle!

    Dim currentLine As String
    LineInput #handle, currentLine 'read the first line
    Debug.Assert currentLine = "connectors:" 'right?

    Dim contents As Object 'early-bound: As Scripting.Dictionary (requires library reference)
    Set contents = CreateObject("Scripting.Dictionary") 'early-bound: = New Scripting.Dictionary

    Dim currentItem As ConnectorInfo
    Dim currentKey As String

    Do Until EOF(handle)

        LineInput #handle, currentLine
        currentKey = Left$(currentLine, Len(currentLine) - 1) 'strip the colon char

        If contents.Exists(currentKey) Then
            'we have seeen this ID before; fetch it
            Set currentItem = contents(currentKey)
        Else
            'new ID; create a new info object
            Set currentItem = New ConnectorInfo
            contents.Add currentKey, currentItem
        End If

        'assumes MPN is the same for all duplicates of a given ConnectorID

        LineInput#handle, currentLine
        currentItem.MPN = Mid$(currentLine, Len("mpn: "))            

        LineInput#handle, currentLine
        currentItem.ParsePinLabels Mid$(currentLine, Len("pinlabels: ["))

    Loop

    'at this point the items dictionary should contain all the ConnectorInfo objects we want to output.
    'GenerateOutputYML wants a Collection, so we iterate the array returned dictionary's Items function
    Dim result As New Collection

    Dim i As Long 'use a For..Next loop to iterate arrays
    For i = LBound(contents.Items) To UBound(contents.Items)
        result.Add contents.Items(i)
    Next

CleanExit:
    Close #handle
    Set ParseInput = result
    Exit Function
CleanFail:
    MsgBox Err.Description 'for debugging; user doesn't need to see this
    Set result = New Collection 'return an empty collection on error
    Resume CleanFail
End Function

缺少的部分是一个宏,它知道从哪里获取输入文件,在哪里保存输出文件,并调用读取器和写入器过程 - 现在我们已经抽象出所有血腥的细节,我们留下了一个清晰的高度-级别的故事要讲:

Public Sub ParseYML()
    Const inputFile As String = "C:\Path\Input.txt"
    Const outputFile As String = "C:\Path\Output.yml"

    Dim connectors As Collection
    Set connectors = ParseInput(inputFile)

    If connectors.Count > 0 Then
        GenerateOutputYML outputFile, connectors
        MsgBox "File '" & outputFile & "' was generated successfully for " & connectors.Count & " connectors."
    Else
        MsgBox "No data was read from the specified input file."
    End If

End Sub

这不是唯一可行的方法,但作为一般经验法则,将数据本身ConnectorInfo(正在生成,可能会工作......但之后很容易难以调整。

通过将解析输入与生成输出分开,您可以更轻松地准确隔离需要调整的代码,而不必影响代码的其他部分。


推荐阅读