首页 > 解决方案 > 如何通过BeautifulSoup通过IP地址扫描绕过非html IP

问题描述

现在我只是尝试使用 BeautifulSoup 通过一系列 IP 地址提取链接。但是,当 IP 地址不是 html 时,它似乎会崩溃。有什么建议可以绕过不是 html 的循环中的 IP 地址吗?

start_ip = ipaddress.IPv4Address(u'170.217.10.0')
end_ip = ipaddress.IPv4Address(u'172.217.10.142')
for ip_int in range(int(start_ip), int(end_ip)):
    ip = ipaddress.IPv4Address(ip_int)
    print ip
    r  = requests.get("http://" + str(ip))
    data = r.text
    soup = BeautifulSoup(data)
    for link in soup.find_all('a'):
        print(link.get('href'))

结果

170.217.10.0
Traceback (most recent call last):
  File "web_scraper.py", line 16, in <module>
    r  = requests.get("http://" + str(ip))
  File "c:\Python27\lib\site-packages\requests\api.py", line 75, in get
    return request('get', url, params=params, **kwargs)
  File "c:\Python27\lib\site-packages\requests\api.py", line 60, in request
    return session.request(method=method, url=url, **kwargs)
  File "c:\Python27\lib\site-packages\requests\sessions.py", line 533, in request
    resp = self.send(prep, **send_kwargs)
  File "c:\Python27\lib\site-packages\requests\sessions.py", line 646, in send
    r = adapter.send(request, **kwargs)
  File "c:\Python27\lib\site-packages\requests\adapters.py", line 449, in send
    timeout=timeout
  File "c:\Python27\lib\site-packages\urllib3\connectionpool.py", line 603, in urlopen
    chunked=chunked)
  File "c:\Python27\lib\site-packages\urllib3\connectionpool.py", line 355, in _make_request
    conn.request(method, url, **httplib_request_kw)
  File "c:\Python27\lib\httplib.py", line 1042, in request
    self._send_request(method, url, body, headers)
  File "c:\Python27\lib\httplib.py", line 1082, in _send_request
    self.endheaders(body)
  File "c:\Python27\lib\httplib.py", line 1038, in endheaders
    self._send_output(message_body)
  File "c:\Python27\lib\httplib.py", line 882, in _send_output
    self.send(msg)
  File "c:\Python27\lib\httplib.py", line 844, in send
    self.connect()
  File "c:\Python27\lib\site-packages\urllib3\connection.py", line 183, in connect
    conn = self._new_conn()
  File "c:\Python27\lib\site-packages\urllib3\connection.py", line 160, in _new_conn
    (self._dns_host, self.port), self.timeout, **extra_kw)
  File "c:\Python27\lib\site-packages\urllib3\util\connection.py", line 70, in create_connection
    sock.connect(sa)

标签: beautifulsoupip

解决方案


推荐阅读