首页 > 解决方案 > Python write() 参数必须是 str

问题描述

我正在使用这个Python 脚本将 .XML wordpress 文件转换为 .txt。这适用于博客帖子,但它不喜欢其他帖子类型。

我已经更改了一些代码,但对于其他帖子类型和博客帖子,它仍然无法正常工作。这是我目前拥有的代码:

    #!/usr/bin/env python

"""This script converts WXR file to a number of plain text files.
WXR stands for "WordPress eXtended RSS", which basically is just a
regular XML file. This script extracts entries from the WXR file into
plain text files. Output format: article name prefixed by date for
posts, article name for pages.
Usage: wxr2txt.py filename [-o output_dir]
"""

import os
import re
import sys
from xml.etree import ElementTree

NAMESPACES = {
        'content': 'http://purl.org/rss/1.0/modules/content/',
        'wp': 'http://wordpress.org/export/1.2/',
}
USAGE_STRING = "Usage: wxr2txt.py filename [-o output_dir]"


def main(argv):
    filename, output_dir = _parse_and_validate_output(argv)
    try:
        data = ElementTree.parse(filename).getroot()
    except ElementTree.ParseError:
        _error("Invalid input file format. Can not parse the input.")
    page_counter, post_counter = 0, 0
    for post in data.find('channel').findall('item'):
        post_type = post.find('wp:post_type', namespaces=NAMESPACES).text

        content = post.find('content:encoded', namespaces=NAMESPACES).text
        date = post.find('wp:post_date', namespaces=NAMESPACES).text
        title = post.find('title').text
        date = date.split(' ')[0].replace('-', '')
        title = re.sub(r'[_]+', '_', re.sub(r'[^a-z0-9+]', '_', title.lower()))

        if post_type == 'post':
            post_filename = date + '_' + title + '.txt'
            post_counter += 1
        else:
            post_filename = title + '.txt'
            page_counter += 1
        with open(os.path.join(output_dir, post_filename), 'w') as post_file:
            post_file.write(content.encode('utf8'))

        post_counter += 1
    print("Saved {} posts and {} pages in directory '{}'.".format(
            post_counter, page_counter, output_dir))


def _parse_and_validate_output(argv):
    if len(argv) not in (2, 4):
        _error("Wrong number of arguments.")
    filename = argv[1]
    if not os.path.isfile(filename):
        _error("Input file does not exist (or not enough permissions).")
    output_dir = argv[3] if len(argv) == 4 and argv[2] == '-o' else os.getcwd()
    if not os.path.isdir(output_dir):
        _error("Output directory does not exist (or not enough permissions).")
    return filename, output_dir


def _error(text):
    print(text)
    print(USAGE_STRING)
    sys.exit(1)

if __name__ == "__main__":
    main(sys.argv)

执行脚本时,在命令提示符中弹出以下错误:

Traceback (most recent call last):
  File "C:\Users\suppo\Desktop\python\script.py", line 71, in <module>
    main(sys.argv)
  File "C:\Users\suppo\Desktop\python\script.py", line 46, in main
    post_file.write(content.encode('utf8'))
TypeError: write() argument must be str, not bytes

所以我知道我必须对内容变量进行编码/解码。但是我似乎真的不知道如何在这个脚本中做到这一点。有人可以指出我正确的方向吗?:)

标签: pythonwordpressexport

解决方案


尝试删除编码('utf-8')。尝试关注

post_file.write(content)

此外,您可以检查type(content)以确保它是字符串。


推荐阅读