beautifulsoup - 如何通过BeautifulSoup通过IP地址扫描绕过非html IP
问题描述
现在我只是尝试使用 BeautifulSoup 通过一系列 IP 地址提取链接。但是,当 IP 地址不是 html 时,它似乎会崩溃。有什么建议可以绕过不是 html 的循环中的 IP 地址吗?
start_ip = ipaddress.IPv4Address(u'170.217.10.0')
end_ip = ipaddress.IPv4Address(u'172.217.10.142')
for ip_int in range(int(start_ip), int(end_ip)):
ip = ipaddress.IPv4Address(ip_int)
print ip
r = requests.get("http://" + str(ip))
data = r.text
soup = BeautifulSoup(data)
for link in soup.find_all('a'):
print(link.get('href'))
结果
170.217.10.0
Traceback (most recent call last):
File "web_scraper.py", line 16, in <module>
r = requests.get("http://" + str(ip))
File "c:\Python27\lib\site-packages\requests\api.py", line 75, in get
return request('get', url, params=params, **kwargs)
File "c:\Python27\lib\site-packages\requests\api.py", line 60, in request
return session.request(method=method, url=url, **kwargs)
File "c:\Python27\lib\site-packages\requests\sessions.py", line 533, in request
resp = self.send(prep, **send_kwargs)
File "c:\Python27\lib\site-packages\requests\sessions.py", line 646, in send
r = adapter.send(request, **kwargs)
File "c:\Python27\lib\site-packages\requests\adapters.py", line 449, in send
timeout=timeout
File "c:\Python27\lib\site-packages\urllib3\connectionpool.py", line 603, in urlopen
chunked=chunked)
File "c:\Python27\lib\site-packages\urllib3\connectionpool.py", line 355, in _make_request
conn.request(method, url, **httplib_request_kw)
File "c:\Python27\lib\httplib.py", line 1042, in request
self._send_request(method, url, body, headers)
File "c:\Python27\lib\httplib.py", line 1082, in _send_request
self.endheaders(body)
File "c:\Python27\lib\httplib.py", line 1038, in endheaders
self._send_output(message_body)
File "c:\Python27\lib\httplib.py", line 882, in _send_output
self.send(msg)
File "c:\Python27\lib\httplib.py", line 844, in send
self.connect()
File "c:\Python27\lib\site-packages\urllib3\connection.py", line 183, in connect
conn = self._new_conn()
File "c:\Python27\lib\site-packages\urllib3\connection.py", line 160, in _new_conn
(self._dns_host, self.port), self.timeout, **extra_kw)
File "c:\Python27\lib\site-packages\urllib3\util\connection.py", line 70, in create_connection
sock.connect(sa)
解决方案
推荐阅读
- azure-logic-apps - 如何连接到 Azure Logic 应用中的 On Premise JIRA 实例
- performance - 如何正确确定 Intel 处理器的 -march 和 -mtune?
- docker - 无法在 oracle linux 上的 docker CE 中启动 docker 容器
- json - 在 ASP.Net Core 中使用 Newtonsoft 将 JSON 转换为模型时出现 ArgumentException
- angular - 在 Web API 上验证 Azure AD 令牌的问题
- python-3.x - 'int' 对象在枚举时不可迭代
- bash - 如何在 Bash 中对条件进行管道传输
- forms - Meteor.js,通过提交表单在不同集合中多次插入
- node.js - npm install 在公司代理后面时出现 407 代理错误
- typescript - Nest 无法解析 ItemsService (?) 的依赖关系。请确保索引 [0] 处的参数在 AppModule 上下文中可用