首页 > 解决方案 > 如何在给定范围内按名称迭代多个文件?

问题描述

所以我试图从包含超过 100k 个文件的库中迭代多个 xml 文件,我需要按文件的最后 3 个数字列出文件。预期结果是从“asset-PD471090”到“asset-PD471110”或“asset-GT888185”到“asset-GT888209”的文件列表,依此类推。

我的代码 -

'''

import glob

strtid = input('From ID: ') # First file in range
seps = strtid[-3:]
endid = input('To ID: ') # Last file in range
eeps = endid[-3:] 
FileId = strtid[:5] # always same File Id for whole range

for name in glob.iglob('asset-' + FileId + [seps-eeps] + '.xml', recursive=True):
    print(name) # iterate over every file in given range and print file names.

''' 我得到的错误是

TypeError: 不支持的操作数类型 -: 'str' 和 'str'

如何加载特定范围的输入文件?

标签: python-3.xglob

解决方案


正如错误告诉您的那样:您尝试-在字符串上使用:

strtid = input('From ID: ') # string
seps = strtid[-3:]          # part of a string

endid = input('To ID: ')    # string 
eeps = endid[-3:]           # part of a string

FileId = strtid[:5]         # also part of a string 

# [seps-eeps]: trying to substract a string from a string:    
for name in glob.iglob('asset-' + FileId + [seps-eeps] + '.xml', recursive=True):

您可以使用 - 将字符串转换为整数int("1234"),但这对您没有多大帮助,因为您的 iglob 只有一个(错误的)数字。

如果您想将它们作为 glob 模式提供,则需要将它们封装在字符串分隔符中 - 而 glob 不适用于数字范围:

  • "[123-678]"将是1,2,3,4,5,6,7,8 的一位数- 不是 123 到 678

但是,您可以自己测试文件:

import os

def get_files(directory, prefix, postfix, numbers):
    lp = len(prefix)       # your assets-GT
    li = len(postfix) + 4  # your id + ".xml"
    for root, dirs, files in os.walk(directory):
        for file in sorted(files): # sorted to get files in order, might not need it
            if int(file[lp:len(file)-li]) in numbers:
                yield os.path.join(root,file)

d = "test"
prefix = "asset-GT"  # input("Basename: ")
postfix = "185"      # input("Id: ")

# create demo files to search into
os.makedirs(d)
for i in range(50,100):
    with open (os.path.join(d,f"{prefix}{i:03}{postfix}.xml"),"w") as f:
        f.write("")

# search params        
fromto = "75 92"     # input("From To (space seperated numbers): ")

fr, to = map(int,fromto.strip().split()) 
to += 1 # range upper limit is exclusive, so need to add 1 to include it

all_searched = list(get_files("./test", prefix, postfix, range(fr,to)))
print(*all_searched, sep="\n")

输出:

./test/asset-GT075185.xml
./test/asset-GT076185.xml
./test/asset-GT077185.xml
./test/asset-GT078185.xml
./test/asset-GT079185.xml
./test/asset-GT080185.xml
./test/asset-GT081185.xml
./test/asset-GT082185.xml
./test/asset-GT083185.xml
./test/asset-GT084185.xml
./test/asset-GT085185.xml
./test/asset-GT086185.xml
./test/asset-GT087185.xml
./test/asset-GT088185.xml
./test/asset-GT089185.xml
./test/asset-GT090185.xml
./test/asset-GT091185.xml
./test/asset-GT092185.xml

推荐阅读