首页 > 解决方案 > What method does os.listdir() use to obtain a list of files in a directory?

问题描述

I am working on a project where I have to edit a few lines of content in some 400 different files. They are all in the same folder, and have each got unique names. For the sake of this question, I will call them fileName001.conf to fileName420.conf.

I am using a python script to obtain the contents of each file before going on to make the edits programmatically. At the moment, I am using this snippet to get the files with some print() lines for debugging:

folderPath = '/file/path/to/list/of/conf/files'

for filename in os.listdir(folderPath):
  print('filename = ' + filename)
  print('filepath = ' + folderPath + '/' + filename)

  with open(folderPath + '/' + filename, 'r') as currFile:
    #... code goes on...

Lines 4 and 5 are designed for debugging only. Running this, I noticed that the script was exhibiting some strange behaviour - the order in which the file names are printed seemed to change on each run. I took this a step further and added the line:

print(os.listdir(folderPath))

Before the for loop in my first code snippet. Now when I run the script from terminal, I can confirm that the output that I get, while contains all file names, has a different order each time:

RafaGuillermo@virtualMachine:~$ python renamefiles.py
['fileName052.txt', 'fileName216.txt', 'fileName084.txt', 'fileName212.txt', 'fileName380.txt', 'fileName026.txt', 'fileName119.txt', etc...]

RafaGuillermo@virtualMachine:~$ python renamefiles.py
['fileName024.txt', 'fileName004.txt', 'fileName209.txt', 'fileName049.txt', 'fileName166.txt', 'fileName198.txt', 'fileName411.txt', etc...]

RafaGuillermo@virtualMachine:~$

As far as getting past this goes - as I want to make sure that I go through the files in the same order each time, I can use

list = sorted(os.listdir(folderPath))

Which alphebetises the list, though it seems counter-intuitive that os.listdir() returns the list of filenames in a different order each time I run the script.

My question is therefore not how can I get a sorted list of files in a directory using os.listdir(), but:

What method does os.listdir() use to retrieve a list of files and why does it seemingly populate its return value in a different way on each call?

标签: pythonlistarraylistdirectoryfilenames

解决方案


回答:

这是该os.listdir()方法的预期行为。

更多信息:

根据Python 软件基金会文档

os.listdir(path='.')

返回一个列表,其中包含路径给定的目录中条目的名称。该列表按任意顺序排列,不包括特殊条目“.”。和 '..' 即使它们存在于目录中。

os.listdir()是一个 C 模块的实现,它位于Python 源代码的 posixmodule.c 中。返回基于存储文件的文件系统的结构,并且根据确定本地操作系统的条件语句的评估具有不同的实现。os.listdir()使用以下 C 代码打开您正在调用的目录:

static PyObject *
_posix_listdir(path_t *path, PyObject *list) {
    /* stuff */
    dirp = opendir(name);

它为存储在 中的目录名称打开一个流name,并返回一个指向具有第一个目录条目位置的目录流的指针。

继续:

for (;;) {
    errno = 0;
    Py_BEGIN_ALLOW_THREADS
    ep = readdir(dirp);
    Py_END_ALLOW_THREADS
    if (ep == NULL) {
        if (errno == 0) {
            break;
        } else {
            Py_DECREF(list);
            list = path_error(path);
            goto exit;
        }
    }
    if (ep->d_name[0] == '.' &&
        (NAMLEN(ep) == 1 ||
         (ep->d_name[1] == '.' && NAMLEN(ep) == 2)))
        continue;
    if (return_str)
        v = PyUnicode_DecodeFSDefaultAndSize(ep->d_name, NAMLEN(ep));
    else
        v = PyBytes_FromStringAndSize(ep->d_name, NAMLEN(ep));
    if (v == NULL) {
        Py_CLEAR(list);
        break;
    }
    if (PyList_Append(list, v) != 0) {
        Py_DECREF(v);
        Py_CLEAR(list);
        break;
    }
    Py_DECREF(v);
}

readdir()被调用,将先前分配的指向目录文件流的指针作为函数参数传递。readdir()在 Linux 上返回一个dirent 结构,它表示指向的目录流中的下一个点dirp

readdir()Linux 手册页中所述:

使用 opendir(3) 打开目录流。连续调用 readdir() 读取文件名的顺序取决于文件系统的实现;名称不太可能以任何方式排序。

所以这种行为是预期的,也是文件系统实现的结果。

参考:


推荐阅读