python-3.x - 我正在尝试使用 python urllib 和美丽的汤从网站获取表数据,但它返回脚本
问题描述
我尝试了 BeautifulSoup,但它从 URL 中抓取了脚本。
url = 'https://ekartlogistics.com/shipmenttrack/FMPP0944216480'
from bs4 import BeautifulSoup
from urllib import request, parse
read = request.urlopen(url)
soup = BeautifulSoup(read, 'html.parser')
print(soup.prettify())
它与其他 HTML 脚本一起返回该脚本。
我正在尝试从此 URL 获取此表数据
解决方案
url 是由 javascript 动态加载的数据。所以你不能只使用beautifulsoup 来获取数据。您可以使用诸如 selenium 之类的自动化工具。这里我使用 selenium 来模仿 javascript 并通过使用 pandas 来抓取表数据,如下所示:
代码:
import time
from selenium import webdriver
from selenium.webdriver.common.by import By
import pandas as pd
driver = webdriver.Chrome('chromedriver.exe')
driver.maximize_window()
time.sleep(5)
driver.get("https://ekartlogistics.com/shipmenttrack/FMPP0944216480")
time.sleep(3)
table = driver.find_element(By.CSS_SELECTOR, 'table.table').get_attribute('outerHTML')
df = pd.read_html(table)[0]
print(df)
输出:
Date Time Place Status
0 Sunday 17 October 04:24:26 PM Kolkata Shipment Created
1 Sunday 17 October 04:24:31 PM Kolkata Dispatched to CentralHub_BAG
2 Sunday 17 October 04:56:00 PM Kolkata Received at CentralHub_BAG
3 Sunday 17 October 04:56:03 PM Kolkata Received at CentralHub_BAG
4 Monday 18 October 03:10:35 AM Patna Dispatched to CentralHub_BHT
5 Tuesday 19 October 04:48:44 AM Patna Received at CentralHub_BHT
6 Tuesday 19 October 05:03:44 PM Samastipur Dispatched to SatelliteHub_SAMA
7 Wednesday 20 October 02:47:44 AM Samastipur Received at SatelliteHub_SAMA
8 Thursday 21 October 09:21:52 AM Samastipur Out For Delivery
9 Friday 22 October 07:38:36 AM Samastipur Delivered
推荐阅读
- c# - 如何在 EF 核心中实现每个租户的架构?
- javascript - AngularJS 检查条件并返回 true 或 false
- c# - 如何正确手动配置 NodaTime IWeekYearRule?
- java - 使用限制和偏移量从服务器加载地图标记
- android - ConstraintLayout 子项的样式
- mysql - 在给定短距离(最大 10 公里)的情况下,以最佳性能计算两个 lat,lng 点之间的距离
- junit4 - HSQL DB:是否可以模拟 Oracle IN 子句项目限制?
- javascript - 在angular6中创建动态形式
- c - 宏中的参数计数
- mysql - MYSQL-连接几个表