首页 > 解决方案 > 从目录读取的Python自定义排序文件

问题描述

我有一个具有以下结构的目录:

Main directory:
|--2001
   |--200101
      |--feed_013_01.zip
      |--feed_restr_013_01.zip
      |--feed_013_04.zip
      |--feed_restr_013_04.zip
      ...
      |--feed_013_30.zip
      |--feed_restr_013_30.zip
...
|--2021
   |--202101
      |--feed_013_01.zip
      |--feed_restr_013_01.zip
      |--feed_013_04.zip
      |--feed_restr_013_04.zip
      ...
      |--feed_013_30.zip
      |--feed_restr_013_30.zip

我需要按顺序阅读和排序 zip 文件:

feed_restr_013_30.zip, feed_013_30.zip.....feed_restr_013_01.zip, feed_013_01.zip

我目前正在做这样的事情:

def atoi(text):
    return int(text) if text.isdigit() else text

def natural_keys(text):
    return [atoi(c) for c in re.split(r'(\d+)', text)]

for path, subdirs, files in os.walk(directory):
    subdirs.sort(key=natural_keys)
    subdirs.reverse()
    files.sort(key=natural_keys)
    files.reverse()

它首先需要所有“restr”文件,我得到的列表如下:

feed_restr_013_30.zip,feed_restr_013_01.zip.....feed_013_30.zip, feed_013_01.zip

更新

我能够使用buran和SCKU的答案以及我现有的逻辑来解决这个问题

def atoi(text):
    return int(text) if text.isdigit() else text

def parse(fname):
    try:
        prefix, *middle, n1, n2 = fname.split('_')
    except:
        prefix, *middle, n1 = fname.split('_')
        n2 = ''
    return (prefix, n1, [atoi(c) for c in re.split(r'(\d+)',n2)], ''.join(middle))

def get_Files(self, directory, source, keywords):
    file_paths = []
    for path, subdirs, files in os.walk(directory):
        for file in files:
            file_name = os.path.join(path, file)
            file_paths.append(file_name)
    return file_paths

files = get_Files(directory, source, keywords)
files.sort(key=parse, reverse=True)

标签: pythonsortingpython-os

解决方案


如果您的目录结构很好且不太大,我建议获取所有文件路径并立即对它们进行排序:

#get all file with path
all_files_path = []
for path, subdirs, files in os.walk(directory):
    for f in files:
        all_files_path.append(os.path.join(path, f))

# define custom sort key function
def which_items_you_want_to_compare(fpath):
    #from buran's answer for sorting the part of file name
    def parse(fname):
        prefix, *middle, n1, n2 = fname.split('_')
        return (prefix, n1, n2, ''.join(middle))

    fpath_split = fpath.split(os.path.sep)
    fn = fpath_split[-1] # file name 'feed_restr_013_01.zip'
    sort_key_fn = parse(fn) # from buran's answer
    d_ym = fpath_split[-2] # dir '202101'
    d_y = fpath_split[-3] # dir '2021'
    
    #compare with year first, then month (last two words in d_ym), then file name sort from buran's answer
    return (int(d_y), int(d_ym[4:])) + sort_key_fn 


sorted_res = sorted(all_files_path, key=which_items_you_want_to_compare, reverse=True)

如果不想倒序年份,可以使用-int(d_y)key 函数中的 etc. 倒序。


推荐阅读