首页 > 解决方案 > 使用 Python 修复文件名和更新引用

问题描述

我有一些没有有意义的文件名的文件。我希望将文件名更改为 title 元素中的值,而不用小写空格,并更新引用。有人可以帮忙吗?我是 Python 新手,只知道 Beautifulsoup 可用于解析 xml 文件。示例文件如下所示 -

Content of xyz1.xml file ->

<?xml version="1.0" encoding="UTF-8"?><!DOCTYPE concept PUBLIC "-//OASIS//DTD DITA Concept//EN" "concept.dtd">
<concept id ="id1"><title>Test1 Topic</title><conbody>
<section>

<p>Testing this topic </p>
<p><xref href="xyz2.xml"/></p>
</section>
</conbody></concept>

Content of xyz2.xml file ->
<?xml version="1.0" encoding="UTF-8"?><!DOCTYPE concept PUBLIC "-//OASIS//DTD DITA Concept//EN" "concept.dtd">
<concept id ="id1"><title>Test2 Topic</title><conbody>
<section>

<p>Testing this topic </p>

<p><xref href="xyz1.xml"/></p>
</section>
</conbody></concept>

Expected output ->
Content of test1_topic.xml
<?xml version="1.0" encoding="UTF-8"?><!DOCTYPE concept PUBLIC "-//OASIS//DTD DITA Concept//EN" "concept.dtd">
<concept id ="id1"><title>Test1 Topic</title><conbody>
<section>

<p>Testing this topic </p>
<p><xref href="test2_topic.xml"/></p>
</section>
</conbody></concept>

Content of test2_topic.xml ->
<?xml version="1.0" encoding="UTF-8"?><!DOCTYPE concept PUBLIC "-//OASIS//DTD DITA Concept//EN" "concept.dtd">
<concept id ="id1"><title>Test2 Topic</title><conbody>
<section>

<p>Testing this topic </p>

<p><xref href="test1_topic.xml"/></p>
</section>
</conbody></concept>

标签: python-3.x

解决方案


此代码将根据 XML 标题中的给定文件夹更改文件的名称

from bs4 import BeautifulSoup
import os

folder=#folder name
change_href = {}
for _, _, files in os.walk(folder):
    for file in files:
        file_location = os.path.join(folder, file)
        print(file_location)
        with open(file_location) as f:
            soup = BeautifulSoup(f.read())
        change_href[file] = "{}.xml".format(soup.find_all('title')[0].text.lower().replace(" ", "_"))
for _, _, files in os.walk(folder):
    for file in files:
        file_location = os.path.join(folder, file)
        with open(file_location, "r+") as f:
            content = f.read()
            for k, v in change_href.items():
                content = content.replace(k, v)
            f.seek(0)
            f.write(content)
        if file_location != os.path.join(folder, change_href[file]):
            os.rename(file_location, os.path.join(folder, change_href[file]))

推荐阅读