python - 循环内 requests.get() 的问题。“未找到连接适配器”
问题描述
所以我正在尝试使用他们的 JSON 版本来抓取几个页面。当我为单个 URL 运行代码时(如附加代码的第一部分),我确实得到了所需的输出,但是,当我尝试在 for 循环中为多个 URL 执行相同操作时,我得到一个“否从请求中找到连接适配器”,这没有多大意义,因为它适用于 for 循环之外的相同 URL。
# Import package
import requests
from pandas import json_normalize
import pandas as pd
# Assign URL to variable: url
url = 'https://www.olx.com.gt/api/relevance/search?category=367&facet_limit=100&location=4168811&location_facet_limit=20&page=1&sorting=desc-creation&user=16c20011d0fx61aada41'
# Package the request, send the request and catch the response: r
r = requests.get(url)
# Decode the JSON data into a dictionary: json_data
json_data = r.json()
# Extract data from the Json file
json_data_2 = json_data['data']
#normalize json data into a dataframe
df = json_normalize(json_data_2)
df.head()
使用此脚本,一切运行顺利。这是我得到错误的地方。
%%time
n_paginas = 0
all_urls = pd.DataFrame()
for paginas in range(0,20):
n_paginas += 1
olx_url = 'https://www.olx.com.gt/api/relevance/search?category=367&facet_limit=100&location=4168811&location_facet_limit=20&page=%s&sorting=desc-creation&user=16c20011d0fx61aada41'
start_urls = [olx_url % n_paginas]
r = requests.get(start_urls)
#json_data = r.json()
#json_data_2 = json_data['data']
#df = json_normalize(json_data_2)
#all_urls.apped(df)
这是回溯:
---------------------------------------------------------------------------
InvalidSchema Traceback (most recent call last)
<timed exec> in <module>
~/anaconda3/lib/python3.7/site-packages/requests/api.py in get(url, params, **kwargs)
74
75 kwargs.setdefault('allow_redirects', True)
---> 76 return request('get', url, params=params, **kwargs)
77
78
~/anaconda3/lib/python3.7/site-packages/requests/api.py in request(method, url, **kwargs)
59 # cases, and look like a memory leak in others.
60 with sessions.Session() as session:
---> 61 return session.request(method=method, url=url, **kwargs)
62
63
~/anaconda3/lib/python3.7/site-packages/requests/sessions.py in request(self, method, url, params, data, headers, cookies, files, auth, timeout, allow_redirects, proxies, hooks, stream, verify, cert, json)
528 }
529 send_kwargs.update(settings)
--> 530 resp = self.send(prep, **send_kwargs)
531
532 return resp
~/anaconda3/lib/python3.7/site-packages/requests/sessions.py in send(self, request, **kwargs)
635
636 # Get the appropriate adapter to use
--> 637 adapter = self.get_adapter(url=request.url)
638
639 # Start time (approximately) of the request
~/anaconda3/lib/python3.7/site-packages/requests/sessions.py in get_adapter(self, url)
726
727 # Nothing matches :-/
--> 728 raise InvalidSchema("No connection adapters were found for {!r}".format(url))
729
730 def close(self):
InvalidSchema: No connection adapters were found for "['https://www.olx.com.gt/api/relevance/search?category=367&facet_limit=100&location=4168811&location_facet_limit=20&page=1&sorting=desc-creation&user=16c20011d0fx61aada41']"
基于页码的新 URL 正在正确生成,如果我在上面的脚本中输入它们中的任何一个,它就可以工作。
任何帮助将不胜感激。
先感谢您。
解决方案
你可能不需要这个start_urls = [olx_url % n_paginas]
零件。无论哪种方式,对 for 循环的这种轻微修改似乎都得到了结果。
# Import package
import requests
from pandas import json_normalize
import pandas as pd
# Assign URL to variable: url
url = 'https://www.olx.com.gt/api/relevance/search?category=367&facet_limit=100&location=4168811&location_facet_limit=20&page=1&sorting=desc-creation&user=16c20011d0fx61aada41'
# Package the request, send the request and catch the response: r
r = requests.get(url)
# Decode the JSON data into a dictionary: json_data
json_data = r.json()
# Extract data from the Json file
json_data_2 = json_data['data']
#normalize json data into a dataframe
df = json_normalize(json_data_2)
df.head()
n_paginas = 0
all_urls = pd.DataFrame()
for paginas in range(0,20):
n_paginas += 1
olx_url = 'https://www.olx.com.gt/api/relevance/search?category=367&facet_limit=100&location=4168811&location_facet_limit=20&page={}&sorting=desc-creation&user=16c20011d0fx61aada41'.format(str(n_paginas))
r = requests.get(olx_url)
all_urls = all_urls.append(pd.DataFrame(json_normalize(r.json()['data'])))
all_urls.shape
(400, 60)
推荐阅读
- r - r 对象未打印在文本中
- css - Bootstrap 4.2.1 Flex-box 布局列溢出浏览器边缘
- java - 如何使用 if 语句缩短方法?
- cassandra - 将相同的记录插入具有不同主键的多个表中
- swift - 如何获取编辑 UITextView 的单元格的行号?
- reactjs - React Redux:警告:未知事件处理程序属性在 mapDispatchToProps 中将被忽略
- javascript - 追加 div “n”次,其中“n”是另一个 html div 中的值
- php - 删除列中除字母和单个空格之外的所有内容
- python - 使用 tensorflow 引入了一个新层
- reactjs - 如何访问组件状态内容?