首页 > 解决方案 > 我如何解决编码python文件的问题

问题描述

在 python 中制作解析器时遇到问题。

from urllib.request import urlopen
from bs4 import BeautifulSoup

html_doc = urlopen("https://yandex.ru/images/search?from=tabbar&text=яблоко")
soup = BeautifulSoup(html_doc)
print(html_doc)

for img in soup.find_all('img'):
    print(img.get("src"))
builtins.UnicodeEncodeError: 'ascii' codec can't encode characters in position 36-41: ordinal not in range(128)

在此处输入图像描述

标签: python

解决方案


错误是由于非 ASCII URL,用于urllib.parse.quote将非 ASCII 附加到 url:

from urllib.request import urlopen
from bs4 import BeautifulSoup

from urllib.parse import quote

html_doc = urlopen("https://yandex.ru/images/search?from=tabbar&text=" + quote("яблоко"))
soup = BeautifulSoup(html_doc)
print(html_doc)

for img in soup.find_all('img'):
    print(img.get("src"))

输出:

//im0-tub-ru.yandex.net/i?id=9550e470e4d75936eaab6bc78263d930&n=13
//im0-tub-ru.yandex.net/i?id=360c840e56e79e44037ee00e38d6c284&n=13
//im0-tub-ru.yandex.net/i?id=edae1e226592942278a0f7896ce98bdb&n=13
//im0-tub-ru.yandex.net/i?id=488807d3cee9a40c4c354ae733aa6c6a&n=13
//im0-tub-ru.yandex.net/i?id=806e10cc5c196e54a91e26d905b58636&n=13
//im0-tub-ru.yandex.net/i?id=66d4f6b8993a267504cc23ec9426f226&n=13
//im0-tub-ru.yandex.net/i?id=7da7b1f80ecc6f9f7137fcf0b61683c8&n=13
//im0-tub-ru.yandex.net/i?id=73b4d42a5f5e66be1ad5d0f599fbaf7c&n=13
//im0-tub-ru.yandex.net/i?id=ec3ee01852df27476594dfefc6364883&n=13
//im0-tub-ru.yandex.net/i?id=b1a08c606f5078f9cf4baad4fe8e459a&n=13
//im0-tub-ru.yandex.net/i?id=8fe235f57b55688b95f9d38d04fcb5d7&n=13
//im0-tub-ru.yandex.net/i?id=77235497f940c7d0e4d799319c8df5b1&n=13
//im0-tub-ru.yandex.net/i?id=19b1e58076c8ce51fd029eae0d1d7e7a&n=13
//im0-tub-ru.yandex.net/i?id=3adb56db0e22ae7fb5038633de5318b0&n=13
//im0-tub-ru.yandex.net/i?id=8e0acb9ca7b4e78ad97f8f01343129a2&n=13
//im0-tub-ru.yandex.net/i?id=fabac207fce051cd562bfcadc118d602&n=13
//im0-tub-ru.yandex.net/i?id=2bd88b1eb70d70508417439d3538d10d&n=13
//im0-tub-ru.yandex.net/i?id=47d3c4339d9a17392317b2de98a9ae23&n=13
//im0-tub-ru.yandex.net/i?id=50a21edf30c7736d44f5f3a327111ae0&n=13
//im0-tub-ru.yandex.net/i?id=1db8cd36e1333d7376bf91c8f797ce8e&n=13
//im0-tub-ru.yandex.net/i?id=c48ae350238da4cdb758dd1f1b7b0c9d&n=13
//im0-tub-ru.yandex.net/i?id=bfbf39648f7379ee23147a4d42a506fb&n=13
//im0-tub-ru.yandex.net/i?id=d9bade6482e5c015a28e85ca544d07fb&n=13
//im0-tub-ru.yandex.net/i?id=2126465dbbf5b405f50cdead47fe4ac8&n=13
//im0-tub-ru.yandex.net/i?id=10ae5a46ea6efaa4ca35040ba948df7c&n=13
//im0-tub-ru.yandex.net/i?id=d3869cd0412cf274954a8297c088002e&n=13
//im0-tub-ru.yandex.net/i?id=bbeadc9712b4978dfce7658f49692d5c&n=13
//im0-tub-ru.yandex.net/i?id=8ffdeec38792161950574eb16efc546f&n=13
//im0-tub-ru.yandex.net/i?id=65c6cd6b1094055c69b89251b0cbf150&n=13
//im0-tub-ru.yandex.net/i?id=a8c22542d8eadb335222d4b037cc7b74&n=13
//im0-tub-ru.yandex.net/i?id=b640fc2198c8731a03a435a504517b9f&n=11&ref=rq
//im0-tub-ru.yandex.net/i?id=93eb2b406a2e1e12f76484d147b19bfc&n=11&ref=rq
//im0-tub-ru.yandex.net/i?id=2eea3544a66acd4c96736cffe49d6252&n=11&ref=rq
//im0-tub-ru.yandex.net/i?id=123d340cf753b45d18050b28904ece08&n=11&ref=rq
//im0-tub-ru.yandex.net/i?id=016dcb736320a4367d8234fa746e06bb&n=11&ref=rq
//im0-tub-ru.yandex.net/i?id=b2813050d610631d93c8426ac3e9fc62&n=11&ref=rq
//im0-tub-ru.yandex.net/i?id=456ca4948973b44ded169b1ae2d3a888&n=11&ref=rq
//im0-tub-ru.yandex.net/i?id=d3f6660fca5da9413b013cbb92c91ab6&n=11&ref=rq
//im0-tub-ru.yandex.net/i?id=34e1a56bb94e872e05909c64e8d298c7&n=11&ref=rq
//im0-tub-ru.yandex.net/i?id=fe0ffea6ca299e5ba8bb435552866508&n=11&ref=rq
//im0-tub-ru.yandex.net/i?id=bd4994eecf708012fe37cb5e78411a96&n=11&ref=rq
//im0-tub-ru.yandex.net/i?id=2b6893a9f6b99d4fbeb5ca92b292af8a&n=11&ref=rq
//mc.yandex.ru/watch/722889


推荐阅读