首页 > 解决方案 > 如何使用 Python、Selenium 和 BeautifulSoup 对 JSP 进行网络抓取?

问题描述

我是一个使用 Python 进行网络抓取的绝对初学者。我正在尝试从此 URL 中提取 ATM 的位置:

https://www.visa.com/atmlocator/mobile/index.jsp#(page:results,params:(query:'Tokyo,%20Japan'))

使用以下代码。

#Script to scrape locations and addresses from VISA's ATM locator


# import the necessary libraries (to be installed if not available):

from selenium import webdriver
from bs4 import BeautifulSoup
import pandas as pd


#ChromeDriver
#(see https://chromedriver.chromium.org/getting-started as reference)

driver = webdriver.Chrome("C:/Users/DefaultUser/Local Settings/Application Data/Google/Chrome/Application/chromedriver.exe")

offices=[] #List to branches/ATM names
addresses=[] #List to branches/ATM locations
driver.get("https://www.visa.com/atmlocator/mobile/index.jsp#(page:results,params:(query:'Tokyo,%20Japan'))") 


content = driver.page_source
soup = BeautifulSoup(content, features = "lxml")


#the following code extracts all the content inside the tags displaying the information requested

for a in soup.findAll('li',attrs={'class':'visaATMResultListItem'}): 
    name=a.find('li', attrs={'class':'data-label'}) 
    address=a.find('li', attrs={'class':'data-label'}) 
    offices.append(name.text)
    addresses.append(address.text)


#next row defines the dataframe with the results of the extraction

df = pd.DataFrame({'Office':offices,'Address':addresses})


#next row displays dataframe content

print(df)


#export data to .CSV file named 'branches.csv'
with open('branches.csv', 'a') as f:
    df.to_csv(f, header=True)

起初,该脚本似乎可以正常工作,因为 Chromedriver 启动并根据浏览器的要求显示结果,但没有返回结果:

Empty DataFrame
Columns: [Office, Address]
Index: []
Process finished with exit code 0

也许我在选择选择器时犯了一个错误?

非常感谢您的帮助

标签: pythonpandasseleniumweb-scrapingbeautifulsoup

解决方案


问题出在定位器上,使用

for a in soup.findAll('li',attrs={'class':'visaATMResultListItem'}): 
    name = a.find('p', attrs={'class':'visaATMPlaceName '}) 
    address = a.find('p', attrs={'class':'visaATMAddress'}) 
    offices.append(name.text)
    addresses.append(address.text)

推荐阅读