首页 > 解决方案 > 带有添加数据的 Pandas DataFrame 到 XML

问题描述

我很难.xml从 Pandas DataFrame 生成文件。我正在使用这个解决方案(如何将 pandas/dataframe 转换为 XML?)(对不起,由于某种原因,堆栈不会让我将一个单词链接到该站点),但我正在尝试添加一个额外的字段。如果我不包含shape参数,则原始解决方案有效,但我确实需要将值添加到.xml文件中。我不知道为什么我不能用参数调用函数。除了调用该函数之外,我还很难将其编写为 xml。我搜索了其他一些堆栈问题,发现这个代码块有效,但是当我打开.xml文件时,我只得到四个数字(30、1、67、44)。虽然如果我在 pycharm 中打开它,我会得到“想要的”视图。

file_handle = open("output.xml", "w")
Q.writexml(file_handle)
file_handle.close()

代码:

print(image_x.shape)
output: (185, 186, 3)

width = image_x.shape[0]
height = image_x.shape[1]
depth = image_x.shape[2]

def func(row, width, height, depth):
    xml = ['<item>']
    shape = [f'<width>{width}</width>\n<height>{height}</height>\n<depth>{depth}</depth>']
    for field in row.index:
        xml.append('  <{0}>{1}</{0}>'.format(field, row[field]))
    xml.append('</item>')
    xml.append(shape)
    return '\n'.join(xml)

xml_file = func(df, width, height, depth)

东风:

   xmin  ymin  xmax  ymax
0    30     1    67    44
1    39   136    73   176

错误:

Traceback (most recent call last):
  File "D:\PyCharmEnvironments\lib\site-packages\pandas\core\indexes\base.py", line 3080, in get_loc
    return self._engine.get_loc(casted_key)
  File "pandas\_libs\index.pyx", line 70, in pandas._libs.index.IndexEngine.get_loc
  File "pandas\_libs\index.pyx", line 101, in pandas._libs.index.IndexEngine.get_loc
  File "pandas\_libs\hashtable_class_helper.pxi", line 4554, in pandas._libs.hashtable.PyObjectHashTable.get_item
  File "pandas\_libs\hashtable_class_helper.pxi", line 4562, in pandas._libs.hashtable.PyObjectHashTable.get_item
KeyError: 0

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "D:/PycharmProjects/Augmentation/random_shit.py", line 100, in <module>
    Q = func(df, width, height, depth)
  File "D:/PycharmProjects/Augmentation/random_shit.py", line 95, in func
    xml.append('  <{0}>{1}</{0}>'.format(field, row[field]))
  File "D:\PyCharmEnvironments\lib\site-packages\pandas\core\frame.py", line 3024, in __getitem__
    indexer = self.columns.get_loc(key)
  File "D:\PyCharmEnvironments\lib\site-packages\pandas\core\indexes\base.py", line 3082, in get_loc
    raise KeyError(key) from err
KeyError: 0

期望的输出:

<annotations>
  <size>
    <width>185</width>
    <height>186</height>
    <depth>3</depth>
  </size>
  <item>
    <xmin>30</xmin>
    <ymin>1</ymin>
    <xmax>67</xmax>
    <ymax>44</ymax>
  </item>
  <item>
    <xmin>39</xmin>
    <ymin>136</ymin>
    <xmax>73</xmax>
    <ymax>176</ymax>
  </item>
</annotations>

标签: pythonxmlpandasdataframe

解决方案


单线功能:

def func(df, width, height, depth):
    return '<annotations>\n'+f'<width>{width}</width>\n<height>{height}</height>\n<depth>{depth}</depth>\n'+df.apply(lambda row:f'<item>\n<xmin>{row.xmin}</xmin>\n<ymin>{row.ymin}</ymin>\n<xmax>{row.xmax}</xmax>\n<ymax>{row.ymax}</ymax>\n</item>\n',axis=1).str.cat()+'\n</annotations>'

使用 apply 和 cat连接字符串+并使用 map-reduce 方法到数据帧。Apply 将构建每个数据帧行并将其转换为与<item>标记等效的字符串,并且 str.cat() 将连接每一行(也将输入参数行重命名为 df)


推荐阅读