首页 > 解决方案 > 使用列表循环网络抓取时的问题

问题描述

该代码运行良好,但是当我添加一个未注册域的网站时,代码停止工作。示例:sincoban.com.br。这个想法是您可以为未注册的域填写一些值。有什么办法可以解决这个问题?

#Script que coleta todas as informações dos domínios ".br"
sites = []
site = {}

domains = ['terra.com.br','oi.com.br','unidas.com.br','sincoban.com.br']

#scrape elements
ff = webdriver.Firefox(executable_path="D:/Programas/gecko/geckodriver.exe")

for domain in domains:

    site = {}

    ff.get('https://www.whois.com/whois/'+ domain)
    html = ff.page_source
    soup = BeautifulSoup(html,'html.parser')

    #Tags de interesse
    list_ = soup.find('div', {'class':'df-block'})
    h = soup.find('div', {'class':'df-block'})

   #names web sites 
    try:
        names = list_
    except:
        names = ""

    names = list_
    registro = []
    for name in names:
        registro.append(name.text.split()[51])
        site['DomainInformation'] = registro
        #print(name)


    #DNS hosting
    try:
        registers = list_
    except:
        registers = ""

    registers = list_
    status = []

    try:
        element = h.text.split().index('published')

    except:
        element = ""

    element = h.text.split().index('published') #elemento de pesquisa
    for register in registers:
        status.append(register.text.split()[element]) #Passa o parâmetro pesquisado
        site['status'] = status
        #print(name)


    #List web sites
    sites.append(site)

在此处输入图像描述

标签: pythonfunctionselenium

解决方案


推荐阅读