首页 > 解决方案 > 如何在python中返回,读取多个.xml文件

问题描述

我正在用 Python 编写一个脚本,它将通过一个文件夹和子文件夹,只读取包含 100 多个文件的 XML 文件。如果我在函数外部硬编码这段代码,它会读取 temp0 中的所有 100 个 XML 文件,但是如果我把这段代码放在函数中并使用 return,函数总是只返回一个 1 文件,我的意思是它只读取一个文件。任何人都可以解释为什么“返回”以这种方式工作吗?提前致谢。

def raw_input(doc):
    for root, dirs, packs in doc:
        for files in packs:
            if files == 'abc.xml':
                filename = os.path.join(root, files)
                open_file = open(filename, 'r')
                perpX_ = open_file.read()
                # print(perpX_)
                outputX_ = re.compile('<test (.*?)</text>', re.DOTALL | re.IGNORECASE).findall(perpX_)
                temp0 = str("|".join(outputX_))
                #print(temp0)
                return temp0

doc=os.walk('./data/')
raw_input(doc)

temp0 = raw_input(doc)
print(temp0)

标签: pythonpython-3.xxmlfunctionreturn

解决方案


return返回函数结果,因此一旦return达到,Python 就退出函数并将 next 表达式的结果return作为函数的输出。

你有return一个for循环,这意味着每次迭代都会到达它,但 Python 解释器假定temp0它是你的函数调用的最终结果,所以它退出了。

您可以在一个列表中返回多个值,例如,像这样:

def raw_input(doc):
    result = []    # this is where your output will be aggregated
    for root, dirs, packs in doc:
        for files in packs:
            if files == 'abc.xml':
                filename = os.path.join(root, files)
                open_file = open(filename, 'r')
                perpX_ = open_file.read()
                # print(perpX_)
                outputX_ = re.compile('<test (.*?)</text>', re.DOTALL | re.IGNORECASE).findall(perpX_)
                # We append the output for current file to the list
                result.append(str("|".join(outputX_)))
    # And now we return our string, at the end of the function.
    # AFTER the for loops
    return '|'.join(result)

doc=os.walk('./data/')

temp0 = raw_input(doc)
print(temp0)

这样,您将获得作为单个字符串的输出。

此外,还有这样的事情generator生成器是可以迭代的对象。您可以使您的代码延迟评估(按需):

# now raw_input is a generator
def raw_input(doc):
    # we don't need a storage now
    for root, dirs, packs in doc:
        for files in packs:
            if files == 'abc.xml':
                filename = os.path.join(root, files)
                open_file = open(filename, 'r')
                perpX_ = open_file.read()
                outputX_ = re.compile('<test (.*?)</text>', re.DOTALL | re.IGNORECASE).findall(perpX_)
                # now we yield current value and function temporary stops its evaluation
                yield str("|".join(outputX_))

doc=os.walk('./data/')
results = raw_input(doc)
# now results is a generator. It is not evaluated yet
# you can get first output like this:
first_out = next(results)
# and then the second:
second_out = next(results)
# or iterate over it, just like over a casual list:
for res in results:
    print(res)
# note that it will iterate only over next values
# (excluding first and second ones, since it doesn't have access to them anymore)

# and now res is empty (we've reached the end of generator)

推荐阅读