xpath - How to use XPath to extract first three sentences from a paragraph?
问题描述
I need to scrape the first three sentences from a paragraph, if they exist, using XPath.
I've already isolated the paragraph I want using:
//h3[contains(., 'Synopsis')]/following-sibling::p[1]
Which returns a plain, unformatted paragraph:
What do we do when the world's walls - its family structures, its value-systems, it political forms - crumble? The central character of this novel, 'Moor' Zogoiby, only son of a wealthy, artistic-bohemian Bombay family, finds himself in such a moment of crisis. His mother, a famous painter and an emotional despot, worships beauty, but Moor is ugly, he has a deformed hand. Moor falls in love, with a married woman; when their secret is revealed, both are expelled; a suicide pact is proposed, but only the woman dies. Moor chooses to accept his fate, plunges into a life of depravity in Bombay, then becomes embroiled in a major financial scandal. The novel ends in Spain, in the studio of a painter who was a lover of Moor's mother: in a violent climax Moor has, one more, to decide whether to save the life of his lover by sacrificing his own.
I only want the first three sentences, and I'm willing to be lenient and ignore that first question mark, I just want whatever comes before the first three periods.
解决方案
concat(
substring-before(//h3[contains(., 'Synopsis')]/following-sibling::p[1]/text(), '.'),
'.',
substring-before(substring-after(//h3[contains(., 'Synopsis')]/following-sibling::p[1]/text(), '.'), '.'),
'.',
substring-before(substring-after(substring-after(//h3[contains(., 'Synopsis')]/following-sibling::p[1]/text(), '.'), '.'), '.'),
'.'
)
(用 XPath 做一些疯狂的事情很有趣,但在现实生活场景中,我不会将它用于这样的任务,除非因为绝对缺乏其他可能性而被迫这样做。)
推荐阅读
- acumatica - 将供应商查找添加到自定义字段
- c++ - 为什么 C++ 随机数生成不支持有符号/无符号 8 位整数类型?
- xslt - 使用 xsl:result-document 时 XProc p:store href 变量
- aws-lambda - 如何在 SAM 模板中为 lambda 函数定义多个触发器?
- java - 我无法理解如何将数组传递给方法并返回数组列表
- performance - 对于 react-native 图像,将静态图像直接添加到本机项目是否有性能优势?
- android - 从 Google Play 下载单个 ARN 报告
- typescript - 如何使 ComponentOverride 通用?
- python - 如何在python中解开json文件?
- javascript - 在 gremlin 中使用 `filter` lambda 和 javascript