首页 > 解决方案 > 如何优化/加速此代码,以便我可以处理大数据集?

问题描述

我正在使用“大”excel数据集,必须在一个范围内找到唯一值并使用该信息来获取:平均值、标准偏差、中值、最小值和最大值。

所以下面的代码可以工作,但我需要通过 41000(x)“循环”。所以它会有点重,所以在我尝试之前,有什么可以在优化方面做的吗?

谢谢!

Sub Finddata()
Dim Startdate As Date
Dim Finalrow As Long
Dim EndDate As Date
Dim Targetperiod As Integer
Dim Company As String
Dim i As Long
Dim d As Integer
Dim x As Long
Dim duplicaterow As Integer
Dim Newduplicaterow As Integer
Dim SourceBook As Workbook
Dim Datasheet As Worksheet, Duplicatesheet As Worksheet

Application.Calculation = xlCalculationManual
Application.ScreenUpdating = False
Application.DisplayStatusBar = False
Application.EnableEvents = False

Set SourceBook = ActiveWorkbook
Set Datasheet = SourceBook.Sheets("Data")
Set Duplicatesheet = SourceBook.Sheets("Duplicate sheet")

Finalrow = Datasheet.Range("A60000").End(xlUp).Row


For x = 2 To 10

Startdate = Datasheet.Range("r" & x)
EndDate = Datasheet.Range("q" & x)
Company = Datasheet.Range("p" & x)
Targetperiod = Datasheet.Range("i" & x)


'Copy data to duplicate sheet
For i = 2 To Finalrow
    If (Cells(i, 17) >= Startdate And Cells(i, 17) <= EndDate And Cells(i, 16) = Company And Cells(i, 9) = Targetperiod) Then
    Range(Cells(i, 1), Cells(i, 19)).Copy
    Duplicatesheet.Range("a10000").End(xlUp).Offset(1, 21).PasteSpecial xlPasteValues
    Duplicatesheet.Range("a10000").End(xlUp).Offset(1, 0).PasteSpecial xlPasteValues
    End If
Next i

'Definition
duplicaterow = Duplicatesheet.Range("v10000").End(xlUp).Row

'Removes duplicate analyst names
Duplicatesheet.Range("v1", "an" & duplicaterow).RemoveDuplicates Columns:=14, Header:=xlYes

'Definition
Newduplicaterow = Duplicatesheet.Range("v10000").End(xlUp).Row + 1

'Removes two columns that are not needed
Duplicatesheet.Range("Am:An").EntireColumn.Delete

'Below is for finding most recent observation and Target price.
d = 2

Do While Duplicatesheet.Cells(d, 38).Value <> ""
    Duplicatesheet.Cells(d, 39).FormulaLocal = "=MAX(IF('Duplicate sheet'!n:n='Duplicate sheet'!Ai" & d & ";'Duplicate sheet'!q:q;))"
    d = d + 1
Loop

  Dim c As Range
  For Each c In Duplicatesheet.Range("Am2", "am" & Newduplicaterow)
    c.FormulaArray = c.FormulaR1C1
  Next c

d = 2

Do While Duplicatesheet.Cells(d, 38).Value <> ""
    Duplicatesheet.Cells(d, 39).Value = Duplicatesheet.Cells(d, 39).Value
    Duplicatesheet.Cells(d, 40).Value = Duplicatesheet.Cells(d, 35) & ", " & Duplicatesheet.Cells(d, 39)
    d = d + 1
Loop

d = 2

Do While Duplicatesheet.Cells(d, 38).Value <> ""
    Duplicatesheet.Cells(d, 41).FormulaLocal = "=index('Duplicate sheet'!d:d;match('Duplicate sheet'!AN" & d & ";'Duplicate sheet'!s:s;0);0)"
    Duplicatesheet.Cells(d, 41).Value = Duplicatesheet.Cells(d, 41).Value
    d = d + 1
Loop


'This section creates the values that are needed in the data sheet, for consensus
Duplicatesheet.Range("v" & Newduplicaterow + 1).Offset(0, 0).FormulaLocal = "=Average(AO2:AO" & Newduplicaterow - 1 & ")"
Duplicatesheet.Range("v" & Newduplicaterow + 1).Offset(0, 1).FormulaLocal = "=iferror(STDEV.S(AO2:AO" & Newduplicaterow - 1 & ");count(AO2:AO" & Newduplicaterow - 1 & "))"
Duplicatesheet.Range("v" & Newduplicaterow + 1).Offset(0, 2).FormulaLocal = "=MEDIAN(AO2:AO" & Newduplicaterow - 1 & ")"
Duplicatesheet.Range("v" & Newduplicaterow + 1).Offset(0, 3).FormulaLocal = "=Min(AO2:AO" & Newduplicaterow - 1 & ")"
Duplicatesheet.Range("v" & Newduplicaterow + 1).Offset(0, 4).FormulaLocal = "=max(AO2:AO" & Newduplicaterow - 1 & ")"
Duplicatesheet.Range("v" & Newduplicaterow + 1).Offset(0, 0).Value = Duplicatesheet.Range("v" & Newduplicaterow + 1).Offset(0, 0).Value
Duplicatesheet.Range("v" & Newduplicaterow + 1).Offset(0, 1).Value = Duplicatesheet.Range("v" & Newduplicaterow + 1).Offset(0, 1).Value
Duplicatesheet.Range("v" & Newduplicaterow + 1).Offset(0, 2).Value = Duplicatesheet.Range("v" & Newduplicaterow + 1).Offset(0, 2).Value
Duplicatesheet.Range("v" & Newduplicaterow + 1).Offset(0, 3).Value = Duplicatesheet.Range("v" & Newduplicaterow + 1).Offset(0, 3).Value
Duplicatesheet.Range("v" & Newduplicaterow + 1).Offset(0, 4).Value = Duplicatesheet.Range("v" & Newduplicaterow + 1).Offset(0, 4).Value

Duplicatesheet.Range("v" & Newduplicaterow + 1, "z" & Newduplicaterow + 1).Copy
Datasheet.Range("t" & x).PasteSpecial xlPasteValues

Duplicatesheet.Range("A2:BB6000").ClearContents

Next x


Application.Calculation = xlCalculationAutomatic
Application.ScreenUpdating = True
Application.DisplayStatusBar = True
Application.EnableEvents = True

End Sub

标签: vba

解决方案


如果没有看到数据和工作簿的结构和布局(如果有更简单/更有效的方法来处理您正在做的事情),就很难说。但以下是一些基本的观察。


您可能应该在Range.AutoFilter下面使用(而不是一次循环数万行)。然后一口气复制粘贴Range.SpecialCells(xlCellTypeVisible)到你duplicateSheet的。CDbl()在指定日期过滤条件时,您可能需要将日期转换为双精度(带函数)。(您也可以读入一次数组。使用数组时性能可能会更快,但您还必须编写更多代码。)

    'Copy data to duplicate sheet
    For i = 2 To finalRow
        If (Cells(i, 17) >= startDate And Cells(i, 17) <= EndDate And Cells(i, 16) = Company And Cells(i, 9) = Targetperiod) Then
            Range(Cells(i, 1), Cells(i, 19)).Copy
            duplicateSheet.Range("a10000").End(xlUp).Offset(1, 21).PasteSpecial xlPasteValues
            duplicateSheet.Range("a10000").End(xlUp).Offset(1, 0).PasteSpecial xlPasteValues
        End If
    Next i

下面,您将应该是数组公式的内容作为非数组公式分配给列中的每个单元格AM(一次一行);然后再次执行循环(一次一行)将非数组公式转换为数组公式;然后再次执行循环(一次一行)以将公式转换为静态值。

是否有理由不能一次性为整个范围设置数组公式?我相信 Excel 会为您处理相关的单元格引用。然后将所述范围的公式转换为静态值。(或者,您可以将范围的值读入数组,并MAX在内存中计算条件。但如前所述,您需要编写更多代码。)

此外,一些较新版本的 Excel 本身具有MAXIFS功能。如果您可以访问它,请尝试使用它。另一个观察结果是您的数组公式是指整个列。可能值得限制范围(即该列中最后使用的行),因此您只查看 <100k 单元格(例如)而不是 100 万个。

最后一点是字符串的连接可能会很昂贵。通常,您希望JOIN在连接时使用某种形式的函数,但不确定它对性能有多大好处,因为您只连接两个值(每次循环迭代)。

    'Below is for finding most recent observation and Target price.
    d = 2

    Do While duplicateSheet.Cells(d, 38).Value <> ""
        duplicateSheet.Cells(d, 39).FormulaLocal = "=MAX(IF('Duplicate sheet'!n:n='Duplicate sheet'!Ai" & d & ";'Duplicate sheet'!q:q;))"
        d = d + 1
    Loop

    Dim c As Range
    For Each c In duplicateSheet.Range("Am2", "am" & Newduplicaterow)
      c.FormulaArray = c.FormulaR1C1
    Next c

    d = 2

    Do While duplicateSheet.Cells(d, 38).Value <> ""
        duplicateSheet.Cells(d, 39).Value = duplicateSheet.Cells(d, 39).Value
        duplicateSheet.Cells(d, 40).Value = duplicateSheet.Cells(d, 35) & ", " & duplicateSheet.Cells(d, 39)
        d = d + 1
    Loop

您可能可以一次性将下面的公式分配给整个范围。需要注意的一点是,提供0asMATCH的第三个参数意味着您正在执行线性搜索(对于每个循环迭代)。考虑使用字典或集合进行更快的查找(字典可能更方便,因为它有一个Exists方法)。在您的情况下,我认为列中的值S将是键,列中的值D将是键的对应值。

此外,您可以一次性将整个范围转换为静态值(而不是一次循环一行)。

    d = 2

    Do While duplicateSheet.Cells(d, 38).Value <> ""
        duplicateSheet.Cells(d, 41).FormulaLocal = "=index('Duplicate sheet'!d:d;match('Duplicate sheet'!AN" & d & ";'Duplicate sheet'!s:s;0);0)"
        duplicateSheet.Cells(d, 41).Value = duplicateSheet.Cells(d, 41).Value
        d = d + 1
    Loop

希望这能给你一些关于从哪里开始的想法。


推荐阅读