python-3.x - Python Requests.Get - 给出无效的架构错误
问题描述
另一个给你。
尝试从 CSV 文件中抓取 URL 列表。这是我的代码:
from bs4 import BeautifulSoup
import requests
import csv
with open('TeamRankingsURLs.csv', newline='') as f_urls, open('TeamRankingsOutput.csv', 'w', newline='') as f_output:
csv_urls = csv.reader(f_urls)
csv_output = csv.writer(f_output)
for line in csv_urls:
page = requests.get(line[0]).text
soup = BeautifulSoup(page, 'html.parser')
results = soup.findAll('div', {'class' :'LineScoreCard__lineScoreColumnElement--1byQk'})
for r in range(len(results)):
csv_output.writerow([results[r].text])
...这给了我以下错误:
Traceback (most recent call last):
File "TeamRankingsScraper.py", line 11, in <module>
page = requests.get(line[0]).text
File "C:\Users\windowshopr\AppData\Local\Programs\Python\Python36\lib\site-packages\requests\api.py", line 72, in get
return request('get', url, params=params, **kwargs)
File "C:\Users\windowshopr\AppData\Local\Programs\Python\Python36\lib\site-packages\requests\api.py", line 58, in request
return session.request(method=method, url=url, **kwargs)
File "C:\Users\windowshopr\AppData\Local\Programs\Python\Python36\lib\site-packages\requests\sessions.py", line 512, in request
resp = self.send(prep, **send_kwargs)
File "C:\Users\windowshopr\AppData\Local\Programs\Python\Python36\lib\site-packages\requests\sessions.py", line 616, in send
adapter = self.get_adapter(url=request.url)
File "C:\Users\windowshopr\AppData\Local\Programs\Python\Python36\lib\site-packages\requests\sessions.py", line 707, in get_adapter
raise InvalidSchema("No connection adapters were found for '%s'" % url)
requests.exceptions.InvalidSchema: No connection adapters were found for 'https://www.teamrankings.com/mlb/stat/runs-per-game?date=2018-04-15'
我的 CSV 文件只是 A 列中几个 url 的列表(即https://www..)
(我试图抓取的 div 类在该页面上不存在,但这不是问题所在。至少我不这么认为。我只需要在可以从 CSV 读取它时更新它文件。)
有什么建议么?因为此代码适用于另一个项目,但由于某种原因,我遇到了这个新 URL 列表的问题。非常感谢!
解决方案
从追溯,requests.exceptions.InvalidSchema: No connection adapters were found for 'https://www.teamrankings.com/mlb/stat/runs-per-game?date=2018-04-15'
查看url中的随机字符,它应该从https://www.teamrankings.com/mlb/stat/runs-per-game?date=2018-04-15
因此,首先使用正则表达式解析 csv 并删除 http/https 之前的任何随机字符。那应该可以解决您的问题。
如果您想在阅读 csv 时使用此特定 url 解决当前问题,请执行以下操作:
import regex as re
strin = "https://www.teamrankings.com/mlb/stat/runs-per-game?date=2018-04-15"
re.sub(r'.*http', 'http', strin)
这将为您提供请求可以处理的正确 url。
由于您要求对循环中可访问的路径进行完整修复,因此您可以执行以下操作:
from bs4 import BeautifulSoup
import requests
import csv
import regex as re
with open('TeamRankingsURLs.csv', newline='') as f_urls, open('TeamRankingsOutput.csv', 'w', newline='') as f_output:
csv_urls = csv.reader(f_urls)
csv_output = csv.writer(f_output)
for line in csv_urls:
page = re.sub(r'.*http', 'http', line[0])
page = requests.get(page).text
soup = BeautifulSoup(page, 'html.parser')
results = soup.findAll('div', {'class' :'LineScoreCard__lineScoreColumnElement--1byQk'})
for r in range(len(results)):
csv_output.writerow([results[r].text])
推荐阅读
- android - 在矩形内绘制圆角
- android - 如何将谷歌地图上的交通数据解析为json?
- javascript - 地图 Js 中的逗号
- javascript - 如果在选择器中使用变量,则未正确选择子元素
- java - 找不到符号 [ERROR] 符号:方法 getlogger(java.lang.Class
) - sql - SAP HANA -count(*) 给出不同的输出
- c# - 如何使用 ASP.NET 和 C# 在 Gridview 中实现批量保存
- javascript - 如何使用 Jest 和 Enzyme 测试具有 Router、Redux 和两个 HOC 的 React 组件?
- javascript - Node.js child_process TypeError:无法读取未定义的属性“_writableState”
- php - 使用 array_intersect 检查数组中的多个值