首页 > 解决方案 > 文本字符串中单词的唯一计数

问题描述

我有一个包含多个字符串的数据集,我想要一个唯一的出现次数,以便我可以查看和优化我的数据集。我一直无法使用公式来做到这一点,所以转而使用 VBA,但由于我是业余爱好者,所以遇到了障碍。

我的数据看起来像这样...

在此处输入图像描述

我希望它返回这个...

在此处输入图像描述

我尝试将文本解析为列,但在大型数据集中,我的字符串中有 60 列,其中有 100 次点击。因此,将其转置然后尝试计算唯一性将是令人生畏的。

因此,我希望 VBA 会有所帮助,但我似乎只能获得一个函数,而不是使用 Sub 和 Function 转置然后计数。像下面这样...

Sub Main()
    Dim filename As String
    Dim WorksheetName As String
    Dim CellRange As String
    
    Sheets.Add.Name = "ParsedOutput"

'==============================================================
' CHANGE THESE VALUES FOR YOUR SHEET   
WorksheetName =   
CellRange =    
'==============================================================
   
    ' Get range
    Dim Range
    Set Range = ThisWorkbook.Worksheets(WorksheetName).Range(CellRange)

    ' Copy range to avoid overwrite
    Range.Copy _
        Destination:=ThisWorkbook.Worksheets("ParsedOutput").Range("A1")
        
    ' Get copied exclusions
    Dim Copy
    Set Copy = ThisWorkbook.Worksheets("ParsedOutput").Range("A:A")
    
    ' Parse and overwrite
    Copy.TextToColumns _
        Destination:=Range("A:A"), _
        DataType:=xlDelimited, _
        ConsecutiveDelimiter:=True, _
        Comma:=True

End Sub

Option Explicit

Public Function Counter(InputRange As Range) As String

Dim CellValue As Variant, UniqueValues As New Collection

Application.Volatile

'For error Handling On Error Resume Next

'Looping through all the cell in the defined range For Each CellValue In InputRange
    UniqueValues.Add CellValue, CStr(CellValue)  ' add the unique item Next

'Returning the count of number of unique values CountUniqueValues = UniqueValues.Count

End Function

标签: excelvbapowerquery

解决方案


为了简单起见,我将使用最少的数据来演示如何实现您想要的。随意更改代码以满足您的需求。

电子表格

假设我们的工作表看起来像这样

在此处输入图像描述

逻辑:

  1. 找到最后一行和最后一列,如图所示构建您的范围。
  2. 将该范围的值存储在一个数组中。
  3. 循环遍历该数组中的每个项目并提取基于,分隔符的单词并将其存储在集合中。如果分隔符不存在,则将整个单词存储在集合中。要创建一个唯一的集合,我们使用On Error Resume Next如下代码所示。
  4. 根据集合中的单词数,我们创建一个二维数组用于输出。数组的一部分将保存单词,另一部分将保存出现次数。
  5. 使用.Find 和 .FindNext计算某个单词在范围内的出现次数,然后将其存储在数组中。
  6. 将数组一次性写入相关单元格。出于演示目的,我将写信给D 列

代码

我已经对代码进行了注释,因此您理解它应该没有问题,但是如果您这样做了,那么只需询问即可。

Option Explicit

Sub Sample()
    Dim ws As Worksheet
    
    '~~> Change this to relevant sheet
    Set ws = Sheet1
    
    Dim LastRow As Long, LastColumn As Long
    Dim i As Long, j As Long, k As Long
    Dim col As New Collection
    Dim itm As Variant, myAr As Variant, tmpAr As Variant
    Dim OutputAr() As String
    Dim aCell As Range, bCell As Range, rng As Range
    Dim countOfOccurences As Long
    
    With ws
        '~~> Find last row
        LastRow = .Cells.Find(What:="*", _
                  After:=.Range("A1"), _
                  Lookat:=xlPart, _
                  LookIn:=xlFormulas, _
                  SearchOrder:=xlByRows, _
                  SearchDirection:=xlPrevious, _
                  MatchCase:=False).Row
        
        '~~> Find last column
        LastColumn = .Cells.Find(What:="*", _
                     After:=.Range("A1"), _
                     Lookat:=xlPart, _
                     LookIn:=xlFormulas, _
                     SearchOrder:=xlByColumns, _
                     SearchDirection:=xlPrevious, _
                     MatchCase:=False).Column
                     
        '~~> Construct your range
        Set rng = .Range(.Cells(1, 1), .Cells(LastRow, LastColumn))
        
        '~~> Store the value in an array
        myAr = rng.Value2
        
        '~~> Create a unique collection
        For i = LBound(myAr) To UBound(myAr)
            For j = LBound(myAr) To UBound(myAr)
                If Len(Trim(myAr(i, j))) <> 0 Then
                    '~~> Check data has "," delimiter
                    If InStr(1, myAr(i, j), ",") Then
                        tmpAr = Split(myAr(i, j), ",")
                        
                        For k = LBound(tmpAr) To UBound(tmpAr)
                            On Error Resume Next
                            col.Add tmpAr(k), CStr(tmpAr(k))
                            On Error GoTo 0
                        Next k
                    Else
                        On Error Resume Next
                        col.Add myAr(i, j), CStr(myAr(i, j))
                        On Error GoTo 0
                    End If
                End If
            Next j
        Next i
        
        '~~> Count the number of items in the collection
        i = col.Count
        
        '~~> Create output array for storage
        ReDim OutputAr(1 To i, 1 To 2)
        i = 1
        
        '~~> Loop through unique collection
        For Each itm In col
            OutputAr(i, 1) = Trim(itm)
            countOfOccurences = 0
            
            '~~> Use .Find and .Findnext to count for occurences
            Set aCell = rng.Find(What:=OutputAr(i, 1), LookIn:=xlValues, _
                Lookat:=xlPart, SearchOrder:=xlByRows, SearchDirection:=xlNext, _
                MatchCase:=False, SearchFormat:=False)
        
            If Not aCell Is Nothing Then
                Set bCell = aCell
                countOfOccurences = countOfOccurences + 1
                Do
                    Set aCell = rng.FindNext(After:=aCell)
        
                    If Not aCell Is Nothing Then
                        If aCell.Address = bCell.Address Then Exit Do
                        countOfOccurences = countOfOccurences + 1
                    Else
                        Exit Do
                    End If
                Loop
            End If
            
            '~~> Store count in array
            OutputAr(i, 2) = countOfOccurences
            i = i + 1
        Next itm
        
        '~~> Output it to relevant cell
        .Range("D1").Resize(UBound(OutputAr), 2).Value = OutputAr
    End With
End Sub

输出

在此处输入图像描述


推荐阅读