首页 > 解决方案 > 递归目录查找文件使用 fileInput 模块替换字符串。如何?

问题描述

我正在创建一个简单的 python 脚本来查找和替换文件中的字符串,这些文件也位于子文件夹等中。这需要递归。

以下脚本查找并替换在目标父文件夹的每个文件夹内的每个文件中找到的另一个字符串的字符串。

我在这里发现这篇文章建议使用fileinput模块以避免将整个文件读入内存,这可能会减慢速度......

...简化文件中的文本替换,而无需读取内存中的整个文件...

学分@jfs

Python 非常动态,老实说,我迷失了完成相同任务的许多不同方法。

如何将此方法集成到下面的脚本中?

import subprocess, os, fnmatch

if os.name == 'nt':
    def clear_console():
        subprocess.call("cls", shell=True)
        return
else:
    def clear_console():
        subprocess.call("clear", shell=True)
        return

# Globals
menuChoice = 0
searchCounter = 0

# Recursive find/replace with file extension argument.
def findReplace(directory, find, replace, fileExtension):

    global searchCounter

    #For all paths, sub-directories & files in (directory)...
    for path, dirs, files in os.walk(os.path.abspath(directory)):
        #For each file found with (FileExtension)...
        for filename in fnmatch.filter(files, fileExtension):
            #Construct the target file path...
            filepath = os.path.join(path, filename)
            #Open file correspondent to target filepath.
            with open(filepath) as f:
                # Read it into memory.
                s = f.read()
            # Find and replace all occurrances of (find).
            s = s.replace(find, replace)
            # Write these new changes to the target file path.
            with open(filepath, "w") as f:
                f.write(s)
                # increment search counter by one.
                searchCounter += 1

    # Report final status.
    print ('  Files Searched: ' + str(searchCounter))
    print ('')
    print ('  Search Status : Complete')
    print ('')
    input ('  Press any key to exit...')

def mainMenu():
    global menuChoice
    global searchCounter

    # range lowest index is 1 so range of 6 is 1 through 7.
    while int(menuChoice) not in range(1,1):

        clear_console()
        print ('')
        print ('  frx v1.0 - Menu')
        print ('')
        print ('  A. Select target file type extension.')
        print ('  B. Enter target directory name. eg -> target_directory/target_subfolder')
        print ('  C. Enter string to Find.')
        print ('  D. Enter string to Replace.')
        print ('')
        print ('  Menu')
        print ('')

        menuChoice = input('''
      1. All TXT  files. (*.txt )

      Enter Option: ''')
        print ('')

        # Format as int
        menuChoice = int(menuChoice)

        if menuChoice == 1:

            fextension = '*.txt'

            # Set directory name
            tdirectory = input('  Target directory name? ')
            tdirectory = str(tdirectory)
            print ('')

            # Set string to Find
            fstring = input('  String to find? (Ctrl + V) ')
            fstring = str(fstring)
            print ('')

            # Set string to Replace With
            rstring = input('  Replace with string? (Ctrl + V) ')
            rstring = str(rstring)
            print ('')

            # Report initial status
            print ('  Searching for occurrences of ' + fstring)
            print ('  Please wait...')
            print ('')

            # Call findReplace function
            findReplace('./' + tdirectory, fstring, rstring, fextension)

# Initialize program
mainMenu()

# Action Sample...
#findReplace("in this dir", "find string 1", "replace with string 2", "of this file extension")

# Confirm.
#print("done.")

标签: pythonpython-3.xfilerecursiondirectory

解决方案


您检查输入是否为“.txt”文件是好的;它使您不必担心将 'rb' 或 'wb' 传递给open().

您说您不想为 N 字节文件分配 N 字节,因为担心偶尔 N 可能会很大。最好将内存分配限制为最长文本行的大小,而不是最大文件的大小。让我们分解一个辅助函数。删除/替换这些行:

            #Open file correspondent to target filepath.
            with open(filepath) as f:
                # Read it into memory.
                s = f.read()
            # Find and replace all occurrances of (find).
            s = s.replace(find, replace)
            # Write these new changes to the target file path.
            with open(filepath, "w") as f:
                f.write(s)
                # increment search counter by one.
                searchCounter += 1

调用辅助函数,然后调用计数器:

            update(filepath, find, replace)
            searchCounter += 1

然后定义助手:

def update(filepath, find, replace, temp_fspec='temp'):
    assert temp_fspec != filepath, filepath
    with open(filepath) as fin:
        with open(temp_fspec) as fout:
            for line in fin:
                fout.write(line.replace(find, replace))
    os.rename(temp_fspec, filepath)  # overwrites filepath

使用fileinput不相关,因为这会将来自许多输入的行连接到单个输出流中,并且您的要求是将每个输出与其自己的输入相关联。习语在for line in这里很重要,它的工作fileinput方式与建议的update()帮助程序相同。

考虑在 temp_fspec 中放置不寻常的字符以减少冲突的机会,或者可能使其成为同一文件系统中但在受影响的子树之上的完全限定路径,以保证它永远不会发生冲突。

此版本通常需要更长的时间才能运行,尤其是对于包含短行的冗长文件。如果最大文件大小>>最大行长度,此版本的最大内存占用应该小得多。如果很长的行是一个问题,那么二进制分块方法会更合适,巧妙地处理find可能跨越块边界的情况。如果我们假设find不包含'\n'换行符,我们不需要在当前代码中处理这种情况。

我们可以通过以下方式将您的清除屏幕例程的两个版本简化为一个:

def clear_console():
    clear = 'cls' if os.name == 'nt' else 'clear'
    subprocess.call(clear, shell=True)
    return

推荐阅读