首页 > 解决方案 > 美丽的硒刮汤

问题描述

我正在学习如何使用带硒的美丽汤进行刮擦,我发现了一个有多个表格的网站并找到了表格标签(第一次处理它们)。我正在学习如何尝试从每个表中抓取这些文本并将每个元素附加到受尊重的列表中。首先我试图刮掉第一张桌子,其余的我想自己做。但是由于某种原因我无法访问该标签。

我还加入了 selenium 来访问这些站点,因为当我将指向该站点的链接复制到另一个选项卡上时,由于某种原因,表格列表消失了。

到目前为止我的代码:

import requests
from bs4 import BeautifulSoup
import pandas as pd
import re
from selenium import webdriver
from selenium.webdriver.support.ui import Select

PATH = "C:\Program Files (x86)\chromedriver.exe"
driver = webdriver.Chrome(PATH)

targetSite =  "https://www.sdvisualarts.net/sdvan_new/events.php"
driver.get(targetSite)

select_event = Select(driver.find_element_by_name('subs'))
select_event.select_by_value('All')

select_loc = Select(driver.find_element_by_name('loc'))
select_loc.select_by_value("All")

driver.find_element_by_name("submit").click()


targetSite   = "https://www.sdvisualarts.net/sdvan_new/viewevents.php"
event_title = []
name = []
address = []
city = []
state = []
zipCode = []
location = []
webSite = []
fee = []
event_dates = []
opening_dates = []
description = []

try:
    page = requests.get(targetSite )
    soup = BeautifulSoup(page.text, 'html.parser')
    items = soup.find_all('table', {"class":"popdetail"})
    for i in items:
        event_title.append(item.find('b', {'class': "text"})).text.strip()
        name.append(item.find('td', {'class': "text"})).text.strip()
        address.append(item.find('td', {'class': "text"})).text.strip()
        city.append(item.find('td', {'class': "text"})).text.strip()
        state.append(item.find('td', {'class': "text"})).text.strip()
        zipCode.append(item.find('td', {'class': "text"})).text.strip()

有人可以让我知道我是否正确执行此操作,这是我第一次处理网站的 urls 元素在复制到新选项卡和/或窗口时消失

到目前为止,我无法将任何信息附加到每个列表中。

标签: pythonseleniumselenium-webdriverbeautifulsoup

解决方案


一个问题是for循环。

你有for i in items:,但是你打电话item而不是i

其次,如果您使用 selenium 来呈现页面,那么您可能应该使用 selenium 来获取 html。他们在表格中也有一些嵌入的表格,所以它不像遍历<table>标签那样简单。我最终做的是让 pandas 在表中读取(返回数据帧列表),然后遍历这些表,因为存在数据帧的构造模式。

import pandas as pd
from selenium import webdriver
from selenium.webdriver.support.ui import Select

PATH = "C:\Program Files (x86)\chromedriver.exe"
driver = webdriver.Chrome(PATH)

targetSite =  "https://www.sdvisualarts.net/sdvan_new/events.php"
driver.get(targetSite)

select_event = Select(driver.find_element_by_name('subs'))
select_event.select_by_value('All')

select_loc = Select(driver.find_element_by_name('loc'))
select_loc.select_by_value("All")

driver.find_element_by_name("submit").click()


targetSite   = "https://www.sdvisualarts.net/sdvan_new/viewevents.php"
event_title = []
name = []
address = []
city = []
state = []
zipCode = []
location = []
webSite = []
fee = []
event_dates = []
opening_dates = []
description = []

dfs = pd.read_html(driver.page_source)
driver.close  

for idx, table in enumerate(dfs):
    if table.iloc[0,0] == 'Event Title':
        event_title.append(table.iloc[-1,0])
        tempA = dfs[idx+1]
        tempA.index = tempA[0]
        
        tempB = dfs[idx+4]
        tempB.index = tempB[0]
        
        tempC = dfs[idx+5]
        tempC.index = tempC[0]
        
        name.append(tempA.loc['Name',1])
        address.append(tempA.loc['Address',1])
        city.append(tempA.loc['City',1])
        state.append(tempA.loc['State',1])
        zipCode.append(tempA.loc['Zip',1])
        location.append(tempA.loc['Location',1])
        webSite.append(tempA.loc['Web Site',1])
        
        fee.append(tempB.loc['Fee',1])
        event_dates.append(tempB.loc['Dates',1])
        opening_dates.append(tempB.loc['Opening Days',1])
        
        description.append(tempC.loc['Event Description',1])
        
df = pd.DataFrame({'event_title':event_title,
                    'name':name,
                    'address':address,
                    'city':city,
                    'state':state,
                    'zipCode':zipCode,
                    'location':location,
                    'webSite':webSite,
                    'fee':fee,
                    'event_dates':event_dates,
                    'opening_dates':opening_dates,
                    'description':description})

输出:

print (df.to_string())
                                          event_title                            name                                    address         city       state zipCode             location                                            webSite                                                fee                              event_dates                                      opening_dates                                        description
0   The San Diego Museum of Art Welcomes a Special...         San Diego Museum of Art                 1450 El Prado, Balboa Park    San Diego          CA   92101    Central San Diego                            https://www.sdmart.org/                                                NaN    Starts On 6-18-2020 Ends On 1-10-2021  Opens virtually on June 18. The work will beco...  The San Diego Museum of Art is launching its f...
1                New Exhibit: Miller Dairy Remembered  Lemon Grove Historical Society  3185 Olive Street, Treganza Heritage Park  Lemon Grove          CA   91945    Central San Diego                        http://www.lghistorical.org  Children 12 and under free and must be accompa...    Starts On 6-27-2020 Ends On 12-4-2020  Exhibit on view Saturdays 11 am to 2 pm; close...  From 1926 there were cows smack in the midst o...
2                               Gizmos and Shivelight             Distinction Gallery                           317 E. Grand Ave    Escondido          CA   92025  North County Inland                      http://www.distinctionart.com                                                NaN     Starts On 7-14-2020 Ends On 9-5-2020                                08/08/20 - 09/05/20  Distinction Gallery is proud to present our so...
3                  Virtual Opening - July Exhibitions               Vision Art Museum                   2825 Dewey Rd. Suite 100    San Diego          CA   92106    Central San Diego                    http://www.visionsartmuseum.org                                               Free    Starts On 7-18-2020 Ends On 10-4-2020                                                NaN  Join Visions Art Museum for a virtual exhibiti...
4   Laying it Bare: The Art of Walter Redondo and ...             Fresh Paint Gallery                     1020-B Prospect Street     La Jolla          CA   92037    Central San Diego                      http://freshpaintgallery.com/                                                NaN     Starts On 8-1-2020 Ends On 9-27-2020            Tuesday through Sunday. Mondays closed.  A two-person exhibit of new abstract expressio...
5    Online oil painting lessons with Concetta Antico                             NaN                                        NaN          NaN         NaN     NaN              Virtual  http://concettaantico.com/live-online-oil-pain...                                                NaN    Starts On 8-10-2020 Ends On 8-31-2020                                                NaN  Anyone can learn to paint like the masters! Ov...
6             MOMENTUM: A Creative Industry Symposium                Vanguard Culture                                   Via Zoom    San Diego  California   92101              Virtual  https://www.eventbrite.com/e/momentum-a-creati...                             $10 suggested donation     Starts On 8-17-2020 Ends On 9-7-2020                                                NaN  MOMENTUM: A Creative Industry Symposium Monday...
7                    Virtual Locals Invitational Show        Art & Frames of Coronado                             936 ORANGE AVE     Coronado          CA   92118                    0  https://www.artsteps.com/view/5eed0ad62cd0d65b...                                               free     Starts On 8-21-2020 Ends On 8-1-2021                                                NaN  Art and Frames of Coronado invites you to our ...
8                                          HERE & Now          R.B. Stevenson Gallery              7661 Girard Avenue, Suite 101     La Jolla  California   92037    Central San Diego                  http://www.rbstevensongallery.com                                               Free    Starts On 8-22-2020 Ends On 9-25-2020                           Tuesday through Saturday  R.B.Stevenson Gallery is pleased to announce t...
9                     Art Unites Learning: Normal 2.0                      Art Unites                                        NaN    San Diego         NaN   92116    Central San Diego    https://www.facebook.com/events/956878098104971                                               Free    Starts On 8-25-2020 Ends On 8-25-2020                                                NaN  Please join us on Tuesday, August 25th as we: ...
10  Image Quest Sojourn; Visual Journaling for Per...        Pamela Underwood Studios                                    Virtual          NaN         NaN     NaN              Virtual  http://www.pamelaunderwood.com/event/new-onlin...                                            $595.00   Starts On 8-26-2020 Ends On 11-11-2020                                                NaN  Create a personal Image Quest resource journal...
11  Behind The Exhibition: Southern California Con...         Oceanside Museum of Art                          704 Pier View Way    Oceanside  California   92054              Virtual  https://oma-online.org/events/behind-the-exhib...            No fee required. Donations recommended.    Starts On 8-27-2020 Ends On 8-27-2020                                                NaN  Join curator Beth Smith and exhibitions manage...
12          Lay it on Thick, a Virtual Art Exhibition    San Diego Watercolor Society                    2825 Dewey Rd Bldg #202    San Diego  California   92106                    0                               https://www.sdws.org                                                NaN    Starts On 8-30-2020 Ends On 9-26-2020                                                NaN  The San Diego Watercolor Society proudly prese...
13      The Forum: Marketing & Branding for Creatives                Vanguard Culture                                   Via Zoom    San Diego          CA   92101      South San Diego                        http://vanguardculture.com/                              $5 suggested donation      Starts On 9-1-2020 Ends On 9-1-2020                                                NaN  Attention creative industry professionals! Joi...
14                       Write or Die Solo Exhibition                 You Belong Here                         3619 EL CAJON BLVD    San Diego          CA   92104    Central San Diego  http://www.youbelongsd.com/upcoming-events/wri...            $10 donation to benefit You Belong Here      Starts On 9-4-2020 Ends On 9-6-2020                                                NaN  Write or Die is an immersive installation and ...
15     SDVAN presents Art San Diego at Bread and Salt   San Diego Visual Arts Network                         1955 Julian Avenue     San Digo          CA   92113    Central San Diego  http://www.sdvisualarts.net and https://www.br...                                               Free    Starts On 9-5-2020 Ends On 10-24-2020                                                NaN  We are pleased to announce the four artist rec...
16               The Coming of Treganza Heritage Park  Lemon Grove Historical Society                          3185 Olive Street  Lemon Grove          CA   91945    Central San Diego                        http://www.lghistorical.org                                  Free for all ages    Starts On 9-10-2020 Ends On 9-10-2020  The park is open daily, 8 am to 8 pm. Covid 19...  Lemon Grove\'s central city park will be renam...
17               Online oil painting course | 4 weeks                             NaN                                        NaN          NaN         NaN     NaN              Virtual  http://concettaantico.com/live-online-oil-pain...                                                NaN    Starts On 9-14-2020 Ends On 10-5-2020                                                NaN  Over 4 weekly Zoom lessons, learn the techniqu...
18               Online oil painting course | 4 weeks                             NaN                                        NaN          NaN         NaN     NaN              Virtual  http://concettaantico.com/live-online-oil-pain...                                                NaN   Starts On 10-12-2020 Ends On 11-2-2020                                                NaN  Over 4 weekly Zoom lessons, learn the techniqu...
19                    36th Annual Mission Fed ArtWalk             Mission Fed ArtWalk                                 Ash Street    San Diego  California   92101    Central San Diego                          www.missionfedartwalk.org                                               Free    Starts On 11-7-2020 Ends On 11-8-2020                            Sat and Sun Nov 7 and 8  Mission Fed ArtWalk returns to San Diego’s Lit...
20             Mingei Pop Up Workshop: My Daruma Doll            New Childrens Museum                     200 West Island Avenue    San Diego  California   92101    Central San Diego                        http://thinkplaycreate.org/                                Free with admission  Starts On 11-13-2020 Ends On 11-13-2020                                                NaN  Join Mingei International Museum at The New Ch...

推荐阅读