python - 如何将文本字符串读入 urllib 数据参数?
问题描述
我遵循这些指南(尽管它们适用于 python2)在这里执行搜索,我需要的查询是:
queryText = """
<?xml version="1.0" encoding="UTF-8"?>
<orgPdbQuery>
<queryType>org.pdb.query.simple.TreeEntityQuery</queryType>
<description>TaxonomyTree Search for OTHER SEQUENCES</description>
<t>1</t>
<n>694009</n>
<nodeDesc>OTHER SEQUENCES</nodeDesc>
</orgPdbQuery>
"""
我知道这个查询是正确的,因为当我将它输入到第二个链接“Sample XML Queries”(选择“Source Organism Browser (NCBI)”时,我得到一个输出(这只是它的开始):
383 results
1Q2W:1 1QZ8:1 1SSK:1 1UJ1:1 1UK2:1 1UK3:1 1UK4:1 1UW7:1 1WNC:1 1WOF:1 1WYY:1 1XAK:1 1YO4:1 1YSY:1 1Z1I:1 1Z1J:1 1ZV7:1 1ZV8:1 1ZV8:2 1ZVA:1 1ZVB:1 2A5A:1 2A5I:1 2A5K:1 2ACF:1 2AHM:1 2AHM:2 2AJF:2 2ALV:1 2AMD:1 2AMQ:1 2BEQ:1 2BEQ:2 2BEZ:1 2BEZ:2 2BX3:1 2BX4:1 2C3S:1 2CJR:1 2CME:1 2CME:2 2CME:3 2CME:4 2D2D:1 2DD8:3 2DUC:1 2FAV:1 2FE8:1 2FXP:1 2FYG:1 2G9T:1 2GA6:1 2GDT:1 2GHV:1 2GHW:1 2GIB:1 2GRI:1 2GT7:1 2GT8:1 2GTB:1 2GX4:1 2GZ7:1 2GZ8:1 2GZ9:1 2H2Z:1 2H85:1 2HOB:1 2HSX:1 2IDY:1 2JW8:1 2JZD:1 2JZE:1 2JZF:1 2K7X:1 2K87:1 2KAF:1 2KQV:1 2KQW:1 2KYS:1 2LIZ:1 2MM4:1 2OFZ:1 2OG3:1 2OP9:1 2OZK:1 2PWX:1 2Q6G:1 2QC2:1 2
我现在想在 python 中复制这个搜索,所以我写了这个:
import urllib
import urllib.parse
import urllib.request
url = 'http://www.rcsb.org/pdb/rest/search'
queryText = """
<?xml version="1.0" encoding="UTF-8"?>
<orgPdbQuery>
<queryType>org.pdb.query.simple.TreeEntityQuery</queryType>
<description>TaxonomyTree Search for OTHER SEQUENCES</description>
<t>1</t>
<n>694009</n>
<nodeDesc>OTHER SEQUENCES</nodeDesc>
</orgPdbQuery>
"""
encoded_data = urllib.parse.urlencode(queryText).encode('utf-8')
req = urllib.request.Request(url)
with urllib.request.urlopen(req,data=encoded_data) as f:
resp = f.read()
print(resp)
我得到错误:
Traceback (most recent call last):
File "/Users/slowat/anaconda/envs/py3/lib/python3.6/urllib/parse.py", line 892, in urlencode
raise TypeError
TypeError
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "generate_pdbs_from_rcsb.py", line 19, in <module>
encoded_data = urllib.parse.urlencode(queryText).encode('utf-8')
File "/Users/slowat/anaconda/envs/py3/lib/python3.6/urllib/parse.py", line 900, in urlencode
"or mapping object").with_traceback(tb)
File "/Users/slowat/anaconda/envs/py3/lib/python3.6/urllib/parse.py", line 892, in urlencode
raise TypeError
TypeError: not a valid non-string sequence or mapping obj
有人可以演示如何让这段代码工作吗?
更新1:我也试过:
url = 'http://www.rcsb.org/pdb/rest/search'
d = dict(queryType='org.pdb.query.simple.TreeEntityQuery',n='694009')
f = urllib.parse.urlencode(d)
f = f.encode('utf-8')
req = urllib.request.Request(url,f)
with urllib.request.urlopen(req) as f:
resp = f.read()
print(resp)
其中有输出:
'Problem creating Query from XML: Content is not allowed in prolog.\nqueryType=org.pdb.query.simple.TreeEntityQuery&n=694009\n'
解决方案
该urlencode
函数需要一个字典key: value
对。此处无需使用此函数,因为您将 XML 直接提交给服务。该data
参数应该是字节,因此请确保将您的标记queryText
为字节序列而不是字符串(这是特定于 Python 3 -b
之前"""
将其标记为字节序列而不是纯字符串):
import urllib
import urllib.parse
import urllib.request
url = 'http://www.rcsb.org/pdb/rest/search'
queryText = b"""
<?xml version="1.0" encoding="UTF-8"?>
<orgPdbQuery>
<queryType>org.pdb.query.simple.TreeEntityQuery</queryType>
<description>TaxonomyTree Search for OTHER SEQUENCES</description>
<t>1</t>
<n>694009</n>
<nodeDesc>OTHER SEQUENCES</nodeDesc>
</orgPdbQuery>
"""
req = urllib.request.Request(url)
with urllib.request.urlopen(req,data=queryText) as f:
resp = f.read()
print(resp)
这给出了您期望的结果resp
。
推荐阅读
- jquery - 从模板jQuery获取输入值
- r - 如何将宽数据转换为长格式以进行交叉分类模型 [R, GLMM]
- tensorflow - 如何将对象检测的(帧号、边界框信息、置信度)打印到 tensorflow 对象检测器中的文本文件中?
- c - 放在 C 程序开头的“main()”是什么意思?
- assembly - 'callq *(%rax)' 是什么意思?
- c++ - C++ cout 损坏
- ruby-on-rails - 使用omniauth-facebook使用facebook登录执行过期错误
- xml - FatalError:带有 DTD 的有效 XML 中预期的属性名称
- google-apps-script - 自动存档计划;基于时间的触发
- java - 在 Java 中将 BiFunction 转换为函数