python - Python write() 参数必须是 str
问题描述
我正在使用这个Python 脚本将 .XML wordpress 文件转换为 .txt。这适用于博客帖子,但它不喜欢其他帖子类型。
我已经更改了一些代码,但对于其他帖子类型和博客帖子,它仍然无法正常工作。这是我目前拥有的代码:
#!/usr/bin/env python
"""This script converts WXR file to a number of plain text files.
WXR stands for "WordPress eXtended RSS", which basically is just a
regular XML file. This script extracts entries from the WXR file into
plain text files. Output format: article name prefixed by date for
posts, article name for pages.
Usage: wxr2txt.py filename [-o output_dir]
"""
import os
import re
import sys
from xml.etree import ElementTree
NAMESPACES = {
'content': 'http://purl.org/rss/1.0/modules/content/',
'wp': 'http://wordpress.org/export/1.2/',
}
USAGE_STRING = "Usage: wxr2txt.py filename [-o output_dir]"
def main(argv):
filename, output_dir = _parse_and_validate_output(argv)
try:
data = ElementTree.parse(filename).getroot()
except ElementTree.ParseError:
_error("Invalid input file format. Can not parse the input.")
page_counter, post_counter = 0, 0
for post in data.find('channel').findall('item'):
post_type = post.find('wp:post_type', namespaces=NAMESPACES).text
content = post.find('content:encoded', namespaces=NAMESPACES).text
date = post.find('wp:post_date', namespaces=NAMESPACES).text
title = post.find('title').text
date = date.split(' ')[0].replace('-', '')
title = re.sub(r'[_]+', '_', re.sub(r'[^a-z0-9+]', '_', title.lower()))
if post_type == 'post':
post_filename = date + '_' + title + '.txt'
post_counter += 1
else:
post_filename = title + '.txt'
page_counter += 1
with open(os.path.join(output_dir, post_filename), 'w') as post_file:
post_file.write(content.encode('utf8'))
post_counter += 1
print("Saved {} posts and {} pages in directory '{}'.".format(
post_counter, page_counter, output_dir))
def _parse_and_validate_output(argv):
if len(argv) not in (2, 4):
_error("Wrong number of arguments.")
filename = argv[1]
if not os.path.isfile(filename):
_error("Input file does not exist (or not enough permissions).")
output_dir = argv[3] if len(argv) == 4 and argv[2] == '-o' else os.getcwd()
if not os.path.isdir(output_dir):
_error("Output directory does not exist (or not enough permissions).")
return filename, output_dir
def _error(text):
print(text)
print(USAGE_STRING)
sys.exit(1)
if __name__ == "__main__":
main(sys.argv)
执行脚本时,在命令提示符中弹出以下错误:
Traceback (most recent call last):
File "C:\Users\suppo\Desktop\python\script.py", line 71, in <module>
main(sys.argv)
File "C:\Users\suppo\Desktop\python\script.py", line 46, in main
post_file.write(content.encode('utf8'))
TypeError: write() argument must be str, not bytes
所以我知道我必须对内容变量进行编码/解码。但是我似乎真的不知道如何在这个脚本中做到这一点。有人可以指出我正确的方向吗?:)
解决方案
尝试删除编码('utf-8')。尝试关注
post_file.write(content)
此外,您可以检查type(content)
以确保它是字符串。
推荐阅读
- python - 如何删除 Tkinter 菜单小部件(Python 3)中烦人的默认边框?
- swift - Swift Set Fix Alignment 在任何 iphone 尺寸上
- c# - 来自 c# 的 AutoCAD 运行命令
- npm - npm-live-server https 命令行配置?
- sql-server - 无效的参考格式:在 MAC 上运行 Docker 以在容器中包含 SQL Server 2019
- c++ - 涵盖 boost::trim 并接受 std::vector 的函数
- javascript - 如果我可以只用 nodejs 本身做所有事情,我为什么要使用 Express?
- azure - 向 Azure Synapse Analytics Workspace 添加新用户时需要哪些权限?
- python - 有没有办法通过按下按钮来使用 tkinter 创建标签
- javascript - 如何在 100% 堆积条形图中按值对颜色段进行排序,而不是按 Highcharts 或 ChartJs 中所有堆积条形图的值排序