首页 > 解决方案 > 识别表格行和表格数据 - CSS Selector Python

问题描述

我有多个要从中提取数据的表行案例:

情况1

 Onsite Service After Remote Diagnosis  April 19, 2014  April 19, 2017

案例2

CAR                                     October 15, 2016    October 15, 2017    
Onsite Service After Remote Diagnosis   October 15, 2016    October 15, 2019

案例3

NBD ProSupport                          July 16, 2008   July 15, 2011   
Onsite Service After Remote Diagnosis   July 16, 2008   July 15, 2011

我需要提取的信息位于第二个 td 上包含“远程诊断后现场服务”的行上,对于每种情况,该行右侧的日期都是

预期输出:

                      April 19, 2017
                    October 15, 2017
                       July 15, 2011

我的代码:

from selenium import webdriver
import time
from openpyxl import load_workbook

driver = webdriver.Chrome()


def scrape(codes):
    dates = []
    for i in range(len(codes)):
        driver.get("https://www.dell.com/support/home/us/en/19/product-support/"
                   "servicetag/%s/warranty?ref=captchasuccess" % codes[i])

    # Solve captcha manually
        if i == 0:
            print("You now have 120\" seconds to solve the captcha")
            time.sleep(120)
            print("120\" Passed")
    # Extract data
        expdate = driver.find_element_by_css_selector("#printdivid > div > div.not-annotated.hover > table:nth-child(3) > tbody > tr > td:nth-child(3)")
        print(expdate.get_attribute('innerText'))
    driver.close()

codes = ['159DT3J', '15FDBG2', '10V8YZ1']
scrape(codes)

我的输出:

April 19, 2014
October 15, 2016
July 16, 2008

取自出现的第一行和td 我尝试更改的第一行,tbody > tr > td:nth-child(3)但基于文本进行识别会更好并避免错误。

标签: pythonseleniumhtml-tablecss-selectors

解决方案


由于您需要为“远程诊断后的现场服务”提取文本,我建议您使用以下内容更新用于查找元素的行:

expdate = driver.find_element_by_xpath("//td[text()='Onsite Service After Remote Diagnosis']/following-sibling::td")

在这里,我们使用 xpath 定位器并td在文本旁边查找“远程诊断后的现场服务


推荐阅读