首页 > 解决方案 > 从excel工作表中的单元格中删除冗余数据

问题描述

我的数据存在于 2 个不同列的两个单元格中。

例如:
ColA:A1 单元格有逗号分隔值 1、2、3
ColB:B1 单元格有逗号分隔值 ABC、DEF、ABC

想要实现逻辑,以便它应该显示为,

ColA    ColB
1,3     ABC
2       DEF

Ex2.:
ColA: A1 单元格有逗号分隔值 1,2,3
ColB: B1 单元格有逗号分隔值 ABC,ABC,ABC

ColA      ColB
1,2,3     ABC

到目前为止,我已经为 B 列实现了逻辑,但是,在执行此操作时无法更新 A 列数据。

Sub RemoveDupData()
    Dim sString As String
    Dim MyAr As Variant
    Dim Col As New Collection
    Dim itm

    sString = "ABC,DEF,ABC,CDR"

    MyAr = Split(sString, ",")

    For i = LBound(MyAr) To UBound(MyAr)
        On Error Resume Next
        '-- A collection cannot have the same key twice so here, we are creating a key using the item that we are adding.
        '-- This will ensure that we will not get duplicates.       
        Col.Add Trim(MyAr(i)), CStr(Trim(MyAr(i)))
        On Error GoTo 0
    Next i

    sString = ""

    For Each itm In Col
        sString = sString & "," & itm
    Next

    sString = Mid(sString, 2)

End Sub

标签: vbaexcelduplicates

解决方案


这种方法比 Jeeped 的方法更复杂,但可能更容易适应变化。

我进行了逐行类型的处理,但是,通过简单地更改密钥的生成方式,可以对整个数据集 colB 进行重复数据删除(参见代码中的注释)

我使用字典来确保键不重复,并且字典项将是相关 colA 值的集合。

Sub FixData()
    Dim wsSrc As Worksheet, wsRes As Worksheet, rRes  As Range
    Dim vSrc As Variant, vRes As Variant
    Dim vA As Variant, vB As Variant
    Dim I As Long, J As Long
    Dim dD As Object, Col As Collection
    Dim sKey As String

Set wsSrc = Worksheets("sheet1")

'Note that depending on how you set these parameters, you will be
'able to write the Results anyplace in the workbook,
'even overlying the original data
Set wsRes = Worksheets("sheet1")
    Set rRes = wsRes.Cells(1, 5)

With wsSrc
    vSrc = .Range(.Cells(1, 1), .Cells(.Rows.Count, 2).End(xlUp))
End With

'Use a dictionary to collect both the unique items in ColB (which will be the key)
'and a collection of the relevant objects in ColA
Set dD = CreateObject("scripting.dictionary")
For I = 1 To UBound(vSrc, 1)
    vA = Split(vSrc(I, 1), ",")
    vB = Split(vSrc(I, 2), ",")
        If UBound(vA) <> UBound(vB) Then
            MsgBox "different number of elements in each column"
        End If

        For J = 0 To UBound(vA)
            sKey = vB(J) & "|" & I

            'To remove dups from the entire data set
            ' change above line to:
            'sKey = vB(J)

            If Not dD.Exists(sKey) Then
                Set Col = New Collection
                Col.Add vA(J)
                dD.Add Key:=sKey, Item:=Col
            Else
                dD(sKey).Add vA(J)
            End If
        Next J
Next I

'Create Results array
ReDim vRes(1 To dD.Count, 1 To 2)
I = 0
For Each vB In dD.Keys
    I = I + 1
    vRes(I, 2) = Split(vB, "|")(0)

    For J = 1 To dD(vB).Count
        vRes(I, 1) = vRes(I, 1) & "," & dD(vB)(J)
    Next J
        vRes(I, 1) = Mid(vRes(I, 1), 2) 'remove leading comma
Next vB

'write results to worksheet
Set rRes = rRes.Resize(UBound(vRes, 1), 2)
With rRes
    .EntireColumn.Clear
    .Value = vRes
    .HorizontalAlignment = xlLeft
End With
End Sub

源数据

在此处输入图像描述

逐行处理

在此处输入图像描述

整个数据集处理

在此处输入图像描述


推荐阅读