首页 > 解决方案 > How to use XPath to extract first three sentences from a paragraph?

问题描述

I need to scrape the first three sentences from a paragraph, if they exist, using XPath.

I've already isolated the paragraph I want using:

//h3[contains(., 'Synopsis')]/following-sibling::p[1]

Which returns a plain, unformatted paragraph:

What do we do when the world's walls - its family structures, its value-systems, it political forms - crumble? The central character of this novel, 'Moor' Zogoiby, only son of a wealthy, artistic-bohemian Bombay family, finds himself in such a moment of crisis. His mother, a famous painter and an emotional despot, worships beauty, but Moor is ugly, he has a deformed hand. Moor falls in love, with a married woman; when their secret is revealed, both are expelled; a suicide pact is proposed, but only the woman dies. Moor chooses to accept his fate, plunges into a life of depravity in Bombay, then becomes embroiled in a major financial scandal. The novel ends in Spain, in the studio of a painter who was a lover of Moor's mother: in a violent climax Moor has, one more, to decide whether to save the life of his lover by sacrificing his own. 

I only want the first three sentences, and I'm willing to be lenient and ignore that first question mark, I just want whatever comes before the first three periods.

标签: xpath

解决方案


concat(
  substring-before(//h3[contains(., 'Synopsis')]/following-sibling::p[1]/text(), '.'),
  '.',
  substring-before(substring-after(//h3[contains(., 'Synopsis')]/following-sibling::p[1]/text(), '.'), '.'),
  '.',
  substring-before(substring-after(substring-after(//h3[contains(., 'Synopsis')]/following-sibling::p[1]/text(), '.'), '.'), '.'),
  '.'
)

(用 XPath 做一些疯狂的事情很有趣,但在现实生活场景中,我不会将它用于这样的任务,除非因为绝对缺乏其他可能性而被迫这样做。)


推荐阅读