python - 使用列表循环网络抓取时的问题
问题描述
该代码运行良好,但是当我添加一个未注册域的网站时,代码停止工作。示例:sincoban.com.br。这个想法是您可以为未注册的域填写一些值。有什么办法可以解决这个问题?
#Script que coleta todas as informações dos domínios ".br"
sites = []
site = {}
domains = ['terra.com.br','oi.com.br','unidas.com.br','sincoban.com.br']
#scrape elements
ff = webdriver.Firefox(executable_path="D:/Programas/gecko/geckodriver.exe")
for domain in domains:
site = {}
ff.get('https://www.whois.com/whois/'+ domain)
html = ff.page_source
soup = BeautifulSoup(html,'html.parser')
#Tags de interesse
list_ = soup.find('div', {'class':'df-block'})
h = soup.find('div', {'class':'df-block'})
#names web sites
try:
names = list_
except:
names = ""
names = list_
registro = []
for name in names:
registro.append(name.text.split()[51])
site['DomainInformation'] = registro
#print(name)
#DNS hosting
try:
registers = list_
except:
registers = ""
registers = list_
status = []
try:
element = h.text.split().index('published')
except:
element = ""
element = h.text.split().index('published') #elemento de pesquisa
for register in registers:
status.append(register.text.split()[element]) #Passa o parâmetro pesquisado
site['status'] = status
#print(name)
#List web sites
sites.append(site)
解决方案
推荐阅读
- python - pyplot subplots_adjust(wspace = 0) 命令不起作用
- react-native - 我想更改抽屉中用户名的状态。从更新配置文件屏幕更新后
- c# - 如何使用 TPL 数据流加载、计算和编写单个特征?
- linux - 使用 awk/sed 命令如何从日志文件中过滤字段
- arrays - 将未格式化的数据附加到 Fortran 中的文件,然后读取它
- python - 从 sqlalchemy 获取 pymysql.err.IntegrityError
- javascript - Amcharts 将项目符号添加到轴范围
- drag-and-drop - 如何从Angular 8中的拖放容器中获取值
- git - 发布单个文件以在 GitLab 上下载
- javascript - Vee Validate 属性或方法“错误”未定义 Nuxt JS