首页 > 解决方案 > 从python Selenium webscraping中的同一页面弹出窗口中提取信息

问题描述

注意:我在 python 方面经验丰富,但刚开始使用硒和网络抓取。如果这是一个不好的问题,或者我在硒方面的基础知识似乎有问题,请原谅。我在几个小时的搜索中找不到答案,因此我在这里问

目标:提取企业 Yelp 页面中的“关于企业”信息 某些页面在基于“阅读更多”按钮的弹出窗口中包含有关企业信息(例如:https ://www.yelp.com/biz/and-pizza -bethesda-bethesda)某些页面在基于“阅读更多”按钮的弹出窗口中没有其业务信息(例如:https ://www.yelp.com/biz/pneuma-fashions-upper-marlboro-3 )

问题:无法导航到单击“阅读更多”按钮并提取其中存在的文本后出现的“关于业务”弹出窗口。

截至目前的尝试:通过谷歌搜索,我找到了有关如何处理警报弹出窗口或窗口弹出窗口的解释。但是代码不起作用。单击阅读更多按钮时出现的弹出窗口不会导致 window_handles 发生变化

    import re
    # getting all sections of the page
    result=driver.find_elements_by_tag_name("section")
    About = None
    for sec in result:
    if sec.text.startswith("About the Business"):
        # this pertains only to the About the business section
        
        main_page=driver.current_window_handle
        print(main_page) # Returns the current handle
        
        sec.find_element_by_tag_name("button").click()
        popup=None
        for handle in driver.window_handles: # is an iterable with only one handle 
            # The only handle present is the main_page handle
            print(handle) 
            if handle!=main_page:
                popup = handle
        print(popup) # returns None
        driver.switch_to.window(popup) # Throws error because popup=None

# THE FOLLOWING SECTION IS NOT EXECUTED BECAUSE OF THE ERROR ABOVE
#////////////////////////////////////////////////////

        button_contents=driver.find_elements_by_tag_name("p")
        for b in button_contents:
            print(b.text) # intended to print text contents
        close=driver.find_element_by_tag_name("button")
        close.click()
        driver.switch_to.window(main_page)
        

请帮忙

感谢所有阅读此问题并提供建议和答案的人

标签: seleniumselenium-webdriverdomweb-scrapingbeautifulsoup

解决方案


您应该知道的一件事是弹出窗口不会显示在新窗口中。相反,它显示在同一页面本身。以下是从弹出窗口中提取文本的完整代码:

from selenium import webdriver

driver = webdriver.Chrome()

driver.get('https://www.yelp.com/biz/and-pizza-bethesda-bethesda')

try:
    driver.find_element_by_xpath('//*[@id="wrap"]/div[3]/div/div[4]/div/div/div[2]/div/div/div[1]/div/div[1]/section[5]/div[2]/button').click()

    p1 = driver.find_element_by_xpath('//*[@id="modal-portal-container"]/div[2]/div/div/div/div[2]/div/div[2]/div/div[2]/div/div/div[1]/p').text

    p2 = driver.find_element_by_xpath('//*[@id="modal-portal-container"]/div[2]/div/div/div/div[2]/div/div[2]/div/div[2]/div/div/div[2]/p[2]').text

    print("Specialties --",p1)
    print("History --",p2)

except:
    print('Read more button not found')

输出:

Specialties -- Award-winning pizza: Named one of Fast Company's "World's Most Innovative Companies" in 2018, third-place in the Washington Post Express's of "Best Fast Casual" in 2018, third place in the Washington City Paper's "Best Gluten-Free Menu" in 2018 and won its "Best Pizza in D.C." in 2017, 11th on TripAdvisor's "Best Fast Casual Restaurants -- United States" in 2018.
History -- Since 2012, we've built pizza shops with an edge to their craft pies, beverages and shop design, created an environment where ALL of our Tribe can thrive, supported our local communities and now we'll text you back, if you want. Started with a pizza shop. Became a culture. That's &pizza.

编辑:

由于这不适用于网站,请将第一个替换为find_element_by_xpath

driver.find_element_by_xpath("//div[@class='lemon--div__373c0__1mboc border-color--default__373c0__3-ifU']/button[.='Read more']").click()

这适用于两个网站。


推荐阅读