首页 > 解决方案 > 在python中将文本文件覆盖到xml中,只有空格分隔符

问题描述

我有一个文本文件,它看起来像这样:

1   The star Antares is in which constellation              Scorpius
2   In Islam what is the third piller of wisdom - there's 5 in total        Charity - 2.5 % of
 income
3   Andrew Patterson wrote which definitive Australian song     Waltzing Matilda

我必须把它放在 xml 格式中才能像这样阅读

<number> number of question</number>
<question> the question</question>
<answer> the last word in the line </answer>

我唯一拥有的是空格,数字和问题之间有3个空格,问题和答案之间有10个空格可以这样吗?

我知道如何开始

import xml.etree.ElementTree as ET

with open('xmlfile.xml', encoding='latin-1') as f:
  tree = ET.parse(f)
  root = tree.getroot()

标签: pythonxml

解决方案


以下

import xml.etree.ElementTree as ET
import xml.dom.minidom as minidom
from typing import NamedTuple

SEP1 = '   '
SEP2 = '          '


def prettify(elem):
    """Return a pretty-printed XML string for the Element.
    """
    rough_string = ET.tostring(elem, 'utf-8')
    reparsed = minidom.parseString(rough_string)
    return reparsed.toprettyxml(indent="\t")

class QuestionAndAnswer(NamedTuple):
  num: str
  question: str
  answer: str

data = []
with open('input.txt') as f:
  lines = [l.strip() for l in f.readlines()]
  for line in lines:
    sep1_idx = line.find(SEP1)
    num = line[:sep1_idx]
    sep2_idx = line.find(SEP2)
    question = line[sep1_idx + len(SEP1):sep2_idx]
    answer = line[sep2_idx + len(SEP2):]
    data.append(QuestionAndAnswer(num,question,answer))


root = ET.Element('root')
for entry in data:
  q_and_a = ET.SubElement(root,'q_and_a')
  n = ET.SubElement(q_and_a,'number')
  n.text = entry.num
  q = ET.SubElement(q_and_a,'question')
  q.text = entry.question
  a = ET.SubElement(q_and_a,'answer')
  a.text = entry.answer

tree = ET.ElementTree(root)
tree_str = prettify(root)
with open('output.xml','w') as out:
  out.write(tree_str)  

输入.txt

1   The star Antares is in which constellation          Scorpius
2   In Islam what is the third piller of wisdom - there's 5 in total          Charity - 2.5 % of

输出

<?xml version="1.0" encoding="UTF-8"?>
<root>
   <q_and_a>
      <number>1</number>
      <question>The star Antares is in which constellation</question>
      <answer>Scorpius</answer>
   </q_and_a>
   <q_and_a>
      <number>2</number>
      <question>In Islam what is the third piller of wisdom - there's 5 in total</question>
      <answer>Charity - 2.5 % of</answer>
   </q_and_a>
</root>

推荐阅读