首页 > 解决方案 > 使用 memory_profiler 分析代码会增加执行时间

问题描述

我正在编写一个简单的应用程序,它将一个大文本文件拆分为较小的文件,并且我编写了它的 2 个版本,一个使用列表,一个使用生成器。我使用 memory_profiler 模块分析了这两个版本,它清楚地显示了生成器版本的更好的内存效率,但奇怪的是,当使用生成器的版本被分析时,它增加了执行时间。下面的演示解释了我的意思

使用列表的版本

from memory_profiler import profile


@profile()
def main():
    file_name = input("Enter the full path of file you want to split into smaller inputFiles: ")
    input_file = open(file_name).readlines()
    num_lines_orig = len(input_file)
    parts = int(input("Enter the number of parts you want to split in: "))
    output_files = [(file_name + str(i)) for i in range(1, parts + 1)]
    st = 0
    p = int(num_lines_orig / parts)
    ed = p
    for i in range(parts-1):
        with open(output_files[i], "w") as OF:
            OF.writelines(input_file[st:ed])
        st = ed
        ed = st + p

    with open(output_files[-1], "w") as OF:
        OF.writelines(input_file[st:])


if __name__ == "__main__":
    main()

使用探查器运行时

$ time py36 Splitting\ text\ files_BAD_usingLists.py                                                                                                               

Enter the full path of file you want to split into smaller inputFiles: /apps/nttech/rbhanot/Downloads/test.txt
Enter the number of parts you want to split in: 3
Filename: Splitting text files_BAD_usingLists.py

Line #    Mem usage    Increment   Line Contents
================================================
     6     47.8 MiB      0.0 MiB   @profile()
     7                             def main():
     8     47.8 MiB      0.0 MiB       file_name = input("Enter the full path of file you want to split into smaller inputFiles: ")
     9    107.3 MiB     59.5 MiB       input_file = open(file_name).readlines()
    10    107.3 MiB      0.0 MiB       num_lines_orig = len(input_file)
    11    107.3 MiB      0.0 MiB       parts = int(input("Enter the number of parts you want to split in: "))
    12    107.3 MiB      0.0 MiB       output_files = [(file_name + str(i)) for i in range(1, parts + 1)]
    13    107.3 MiB      0.0 MiB       st = 0
    14    107.3 MiB      0.0 MiB       p = int(num_lines_orig / parts)
    15    107.3 MiB      0.0 MiB       ed = p
    16    108.1 MiB      0.7 MiB       for i in range(parts-1):
    17    107.6 MiB     -0.5 MiB           with open(output_files[i], "w") as OF:
    18    108.1 MiB      0.5 MiB               OF.writelines(input_file[st:ed])
    19    108.1 MiB      0.0 MiB           st = ed
    20    108.1 MiB      0.0 MiB           ed = st + p
    21                             
    22    108.1 MiB      0.0 MiB       with open(output_files[-1], "w") as OF:
    23    108.1 MiB      0.0 MiB           OF.writelines(input_file[st:])



real    0m6.115s
user    0m0.764s
sys     0m0.052s

在没有分析器的情况下运行

$ time py36 Splitting\ text\ files_BAD_usingLists.py 
Enter the full path of file you want to split into smaller inputFiles: /apps/nttech/rbhanot/Downloads/test.txt
Enter the number of parts you want to split in: 3

real    0m5.916s
user    0m0.696s
sys     0m0.080s

现在使用生成器的那个

@profile()
def main():
    file_name = input("Enter the full path of file you want to split into smaller inputFiles: ")
    input_file = open(file_name)
    num_lines_orig = sum(1 for _ in input_file)
    input_file.seek(0)
    parts = int(input("Enter the number of parts you want to split in: "))
    output_files = ((file_name + str(i)) for i in range(1, parts + 1))
    st = 0
    p = int(num_lines_orig / parts)
    ed = p
    for i in range(parts-1):
        file = next(output_files)
        with open(file, "w") as OF:
            for _ in range(st, ed):
                OF.writelines(input_file.readline())

            st = ed
            ed = st + p
            if num_lines_orig - ed < p:
                ed = st + (num_lines_orig - ed) + p
            else:
                ed = st + p

    file = next(output_files)
    with open(file, "w") as OF:
        for _ in range(st, ed):
            OF.writelines(input_file.readline())


if __name__ == "__main__":
    main()

使用探查器选项运行时

$ time py36 -m memory_profiler Splitting\ text\ files_GOOD_usingGenerators.py                                                                                                                                      
Enter the full path of file you want to split into smaller inputFiles: /apps/nttech/rbhanot/Downloads/test.txt
Enter the number of parts you want to split in: 3
Filename: Splitting text files_GOOD_usingGenerators.py

Line #    Mem usage    Increment   Line Contents
================================================
     4   47.988 MiB    0.000 MiB   @profile()
     5                             def main():
     6   47.988 MiB    0.000 MiB       file_name = input("Enter the full path of file you want to split into smaller inputFiles: ")
     7   47.988 MiB    0.000 MiB       input_file = open(file_name)
     8   47.988 MiB    0.000 MiB       num_lines_orig = sum(1 for _ in input_file)
     9   47.988 MiB    0.000 MiB       input_file.seek(0)
    10   47.988 MiB    0.000 MiB       parts = int(input("Enter the number of parts you want to split in: "))
    11   48.703 MiB    0.715 MiB       output_files = ((file_name + str(i)) for i in range(1, parts + 1))
    12   47.988 MiB   -0.715 MiB       st = 0
    13   47.988 MiB    0.000 MiB       p = int(num_lines_orig / parts)
    14   47.988 MiB    0.000 MiB       ed = p
    15   48.703 MiB    0.715 MiB       for i in range(parts-1):
    16   48.703 MiB    0.000 MiB           file = next(output_files)
    17   48.703 MiB    0.000 MiB           with open(file, "w") as OF:
    18   48.703 MiB    0.000 MiB               for _ in range(st, ed):
    19   48.703 MiB    0.000 MiB                   OF.writelines(input_file.readline())
    20                             
    21   48.703 MiB    0.000 MiB               st = ed
    22   48.703 MiB    0.000 MiB               ed = st + p
    23   48.703 MiB    0.000 MiB               if num_lines_orig - ed < p:
    24   48.703 MiB    0.000 MiB                   ed = st + (num_lines_orig - ed) + p
    25                                         else:
    26   48.703 MiB    0.000 MiB                   ed = st + p
    27                             
    28   48.703 MiB    0.000 MiB       file = next(output_files)
    29   48.703 MiB    0.000 MiB       with open(file, "w") as OF:
    30   48.703 MiB    0.000 MiB           for _ in range(st, ed):
    31   48.703 MiB    0.000 MiB               OF.writelines(input_file.readline())



real    1m48.071s
user    1m13.144s
sys     0m19.652s

在没有分析器的情况下运行

$ time py36  Splitting\ text\ files_GOOD_usingGenerators.py 
Enter the full path of file you want to split into smaller inputFiles: /apps/nttech/rbhanot/Downloads/test.txt
Enter the number of parts you want to split in: 3

real    0m10.429s
user    0m3.160s
sys     0m0.016s

那么为什么分析首先让我的代码变慢呢?其次,如果在分析影响执行速度,那么为什么这种影响没有显示在使用列表的代码版本上。

标签: pythonmemory-profiling

解决方案


我使用 line_profiler 对代码进行了 cpu_profiled,这次我得到了答案,生成器的版本需要更多时间的原因是因为以下几行

19         2      11126.0   5563.0      0.2          with open(file, "w") as OF:
    20    379886     200418.0      0.5      3.0              for _ in range(st, ed):
    21    379884    2348653.0      6.2     35.1                  OF.writelines(input_file.readline())

为什么列表版本不会减慢速度是因为

   19         2       9419.0   4709.5      0.4          with open(output_files[i], "w") as OF:
    20         2    1654165.0 827082.5     65.1              OF.writelines(input_file[st:ed])

对于列表,新文件是通过简单地通过切片来获取列表的副本来编写的,这实际上是一条语句。但是对于生成器版本,新文件是通过逐行读取输入文件来填充的,这使得内存分析器配置文件为每一行增加了 CPU 时间。


推荐阅读