python - 如何使用 Python、Selenium 和 BeautifulSoup 对 JSP 进行网络抓取?
问题描述
我是一个使用 Python 进行网络抓取的绝对初学者。我正在尝试从此 URL 中提取 ATM 的位置:
https://www.visa.com/atmlocator/mobile/index.jsp#(page:results,params:(query:'Tokyo,%20Japan'))
使用以下代码。
#Script to scrape locations and addresses from VISA's ATM locator
# import the necessary libraries (to be installed if not available):
from selenium import webdriver
from bs4 import BeautifulSoup
import pandas as pd
#ChromeDriver
#(see https://chromedriver.chromium.org/getting-started as reference)
driver = webdriver.Chrome("C:/Users/DefaultUser/Local Settings/Application Data/Google/Chrome/Application/chromedriver.exe")
offices=[] #List to branches/ATM names
addresses=[] #List to branches/ATM locations
driver.get("https://www.visa.com/atmlocator/mobile/index.jsp#(page:results,params:(query:'Tokyo,%20Japan'))")
content = driver.page_source
soup = BeautifulSoup(content, features = "lxml")
#the following code extracts all the content inside the tags displaying the information requested
for a in soup.findAll('li',attrs={'class':'visaATMResultListItem'}):
name=a.find('li', attrs={'class':'data-label'})
address=a.find('li', attrs={'class':'data-label'})
offices.append(name.text)
addresses.append(address.text)
#next row defines the dataframe with the results of the extraction
df = pd.DataFrame({'Office':offices,'Address':addresses})
#next row displays dataframe content
print(df)
#export data to .CSV file named 'branches.csv'
with open('branches.csv', 'a') as f:
df.to_csv(f, header=True)
起初,该脚本似乎可以正常工作,因为 Chromedriver 启动并根据浏览器的要求显示结果,但没有返回结果:
Empty DataFrame
Columns: [Office, Address]
Index: []
Process finished with exit code 0
也许我在选择选择器时犯了一个错误?
非常感谢您的帮助
解决方案
问题出在定位器上,使用
for a in soup.findAll('li',attrs={'class':'visaATMResultListItem'}):
name = a.find('p', attrs={'class':'visaATMPlaceName '})
address = a.find('p', attrs={'class':'visaATMAddress'})
offices.append(name.text)
addresses.append(address.text)
推荐阅读
- string - Erlang 术语到 Unicode 字符串
- php - Laravel:更新时唯一的句柄
- angular - Angular:构建一个App后,我们怎么知道它是哪个Environment
- symfony - 工作单元异常 Symfony
- wpf - 覆盖 ScrollViewer 和 TextBox(在 WPF 中)后,滚动条的拇指不会根据 TextBox 的文本大小移动
- neo4j - Neo4J - 如何在 allShortestPaths 中排除自引用节点?
- python - Django - 在中间件中添加额外数据以响应
- angularjs - 如何计算离子3中的总量
- javascript - D3 - 在 SVG 中附加 SVG
- sql - 在 PostgreSQL 中使用带括号的 LIKE