python - 使用 Selenium/Python 在 div 中提取一个字符串,该字符串包含许多其他 div 和自己的字符串
问题描述
我需要帮助来在包含许多其他子 div 的 div 中提取评论,这些子 div 有自己的字符串。我有这段代码来获取 div 中的所有字符串。
try:
element = WebDriverWait(driver, 40).until(
EC.element_to_be_clickable((By.XPATH, xpathCommentPage)))
comments = driver.find_elements_by_xpath(realtiveXpathToReportComments) #list all Webelements with comments with type Report
comment = comments[0].get_attribute("innerText")
print(comment)
## Gets this string:
## "Weekly Report. This line is a title and it can vary.
## 10 May 2021. This line is a date and it can vary.
## This is the comment I want to extract. The comment could be long.
## This is a optional comment at the end. I don't want this"
except:
print("could not find comments on comment page")
用于评论的 Web 元素的 html 如下所示(见下文)。如您所见,想要的注释位于外部 div 标记内,其中还包含所有其他不必要的文本字符串。
<div data-testid="comment" type="report" class="sc-cTmXAz cwRvCX">
<div class="sc-bCwfaz sc-QfGIp gDveND hNwnuN">
<div class="sc-hepHJq gyIWHj">
<svg width="28" height="27" viewBox="0 0 28 27" xmlns="http://www.w3.org/2000/svg" data-testid="report">
<g transform="translate(.5)" fill="none" fill-rule="evenodd">
<rect fill="#E3DFF2" width="27" height="27" rx="10.916"></rect>
<path d="M16.009 10.89h-.602a2.73 2.73 0 00-2.723 2.916c.077 1.119.132 1.821.167 2.107.039.327.169 1.096.389 2.31a2.205 2.205 0 002.452 1.792 2.713 2.713 0 002.341-2.328c.128-.951.215-1.543.259-1.774.037-.194.162-.754.376-1.68a2.73 2.73 0 00-2.66-3.343zm6.038 4.327l-.343-.1a1.662 1.662 0 00-2.129 1.598c.001.322.003.52.007.595.005.106.037.398.093.877a1.36 1.36 0 001.515 1.192c.828-.1 1.53-.655 1.82-1.437l.172-.465a1.73 1.73 0 00-1.135-2.26zM7.453 6.436l-.565.057a3.411 3.411 0 00-3.038 3.842c.345 2.597.606 4.268.785 5.014.19.79.641 2.285 1.353 4.485a3.063 3.063 0 003.13 2.112 2.29 2.29 0 002.114-2.53c-.265-2.443-.409-4.119-.432-5.028-.02-.767.072-2.217.276-4.35a3.305 3.305 0 00-3.623-3.602z" fill="#8F7FCE"></path>
</g>
</svg>
<div data-testid="activity-title" class="sc-dmiYbj bEyWbu"><span class="sc-jVBfSZ keEVDi">Weekly Report. This line is a title and it can vary.</span>
</div>
</div>
<div class="sc-bCwfaz sc-jomqko hGiREx gbkTPQ">
<div class="sc-bXXDC jSxSkF">10 May 2021. This line is a date and it can vary. </div>
</div>
</div>This is the comment I want to extract. The comment could be long or short.
<div data-testid="comment-reply-text" class="sc-ckXLN jnxISp">This is a optional comment at the end. I don't want this</div>
<div class="sc-iArHnM eEOXNH">
<form data-testid="reply-form">
<div class="sc-jfkLlK fHByXR">
<textarea data-testid="text-area" placeholder="Vänligen skriv ditt svar här" name="replyMessage" id="replyMessage" rows="3" height="auto" class="sc-fcmMJX ldYVmw"></textarea>
</div>
<div class="sc-bHCRaJ gPCKhJ">
<button data-testid="comment-reply-cancel-button" type="button" class="sc-ckTSus sc-fzJAIQ kiNKGC irzfTA">Avbryt</button>
<button data-testid="comment-reply-submit-button" type="submit" class="sc-lbVvki hxgvIg">Skicka</button>
</div>
</form>
</div>
</div>
我们如何提取想要的评论?我们可以以某种方式使用正则表达式吗?
解决方案
尝试这个
textValue = driver.find_element_by_xpath(".//div[@data-testid='comment']").text.split("\n")[2]
print(textValue)
推荐阅读
- angular - 角度可调整大小的元素 - 动态添加
- javascript - 在 Anki 的目标字段中替换源字段中的单词 -> {{.....}} 使用 Javascripts
- javascript - 如何在多个存储库之间搜索最多复制/粘贴的函数的存储库?
- google-cloud-platform - 如何使用 Cloud Storage Transfer Service 从 S3 Requester Pays 存储桶传输数据?
- gcc - 我不应该被警告关于 -INT_MIN 的未定义行为吗?
- python - 熊猫:添加日期时间作为前缀
- parallel-processing - 并行超过 16 个 DAG 后气流变得不稳定
- python - DNSPython - 直接访问 DNS 查询的“名称”
- python - 为 PDF 中的页面添加额外空间
- codenameone - Bouncy Castle (Codename One lib) 和 AES-256 加密