首页 > 解决方案 > 使用 Selenium/Python 在 div 中提取一个字符串,该字符串包含许多其他 div 和自己的字符串

问题描述

我需要帮助来在包含许多其他子 div 的 div 中提取评论,这些子 div 有自己的字符串。我有这段代码来获取 div 中的所有字符串。

try:
        element = WebDriverWait(driver, 40).until(
            EC.element_to_be_clickable((By.XPATH, xpathCommentPage)))
        
    comments = driver.find_elements_by_xpath(realtiveXpathToReportComments) #list all Webelements with comments with type Report
        comment = comments[0].get_attribute("innerText")
        print(comment) 
    ## Gets this string: 
    ## "Weekly Report. This line is a title and it can vary.
    ## 10 May 2021. This line is a date and it can vary. 
    ## This is the comment I want to extract. The comment could be long. 
    ## This is a optional comment at the end. I don't want this"
        
except:
        print("could not find comments on comment page")

用于评论的 Web 元素的 html 如下所示(见下文)。如您所见,想要的注释位于外部 div 标记内,其中还包含所有其他不必要的文本字符串。

<div data-testid="comment" type="report" class="sc-cTmXAz cwRvCX">
    <div class="sc-bCwfaz sc-QfGIp gDveND hNwnuN">
        <div class="sc-hepHJq gyIWHj">
            <svg width="28" height="27" viewBox="0 0 28 27" xmlns="http://www.w3.org/2000/svg" data-testid="report">
                <g transform="translate(.5)" fill="none" fill-rule="evenodd">
                    <rect fill="#E3DFF2" width="27" height="27" rx="10.916"></rect>
                    <path d="M16.009 10.89h-.602a2.73 2.73 0 00-2.723 2.916c.077 1.119.132 1.821.167 2.107.039.327.169 1.096.389 2.31a2.205 2.205 0 002.452 1.792 2.713 2.713 0 002.341-2.328c.128-.951.215-1.543.259-1.774.037-.194.162-.754.376-1.68a2.73 2.73 0 00-2.66-3.343zm6.038 4.327l-.343-.1a1.662 1.662 0 00-2.129 1.598c.001.322.003.52.007.595.005.106.037.398.093.877a1.36 1.36 0 001.515 1.192c.828-.1 1.53-.655 1.82-1.437l.172-.465a1.73 1.73 0 00-1.135-2.26zM7.453 6.436l-.565.057a3.411 3.411 0 00-3.038 3.842c.345 2.597.606 4.268.785 5.014.19.79.641 2.285 1.353 4.485a3.063 3.063 0 003.13 2.112 2.29 2.29 0 002.114-2.53c-.265-2.443-.409-4.119-.432-5.028-.02-.767.072-2.217.276-4.35a3.305 3.305 0 00-3.623-3.602z" fill="#8F7FCE"></path>
                </g>
            </svg>
            <div data-testid="activity-title" class="sc-dmiYbj bEyWbu"><span class="sc-jVBfSZ keEVDi">Weekly Report. This line is a title and it can vary.</span>
            </div>
        </div>
        <div class="sc-bCwfaz sc-jomqko hGiREx gbkTPQ">
            <div class="sc-bXXDC jSxSkF">10 May 2021. This line is a date and it can vary. </div>
        </div>
    </div>This is the comment I want to extract. The comment could be long or short. 
    <div data-testid="comment-reply-text" class="sc-ckXLN jnxISp">This is a optional comment at the end. I don't want this</div>
    <div class="sc-iArHnM eEOXNH">
        <form data-testid="reply-form">
            <div class="sc-jfkLlK fHByXR">
                <textarea data-testid="text-area" placeholder="Vänligen skriv ditt svar här" name="replyMessage" id="replyMessage" rows="3" height="auto" class="sc-fcmMJX ldYVmw"></textarea>
            </div>
            <div class="sc-bHCRaJ gPCKhJ">
                <button data-testid="comment-reply-cancel-button" type="button" class="sc-ckTSus sc-fzJAIQ kiNKGC irzfTA">Avbryt</button>
                <button data-testid="comment-reply-submit-button" type="submit" class="sc-lbVvki hxgvIg">Skicka</button>
            </div>
        </form>
    </div>
</div>

我们如何提取想要的评论?我们可以以某种方式使用正则表达式吗?

标签: pythonselenium

解决方案


尝试这个

textValue = driver.find_element_by_xpath(".//div[@data-testid='comment']").text.split("\n")[2]
print(textValue)

推荐阅读