首页 > 解决方案 > How to extract a certain sentence in a paragraph? Python


I want to extract certain sentences from a paragraph looking at a certain set of words Object C Statement:. The paragraph is as follows:

Object A Statement: There was a cat with a bag full of meat. It was a red cat with a blue hat. Object B Statement: There was a dog with a bag full of toys. It was a blue dog with a green hat. Object C Statement: There was a dolphin with a bag full of bubbles. It was a purple dolphin with an orange hat. Object D Statement: There was a zebra with a bag full of grass. It was a white zebra with a blue hat. Object E Statement: There was a bear with a bag full of wood. It was a brown bear with a black hat.

I want to extract Object C Statement: as follows:

There was a dolphin with a bag full of bubbles. It was a purple dolphin with an orange hat.

All examples that I have come across are with splitting a specific word etc.

I tried this, but it doesn't work for me:

word="Object A Statement: There was a cat with a bag full of meat. It was a red cat with a blue hat. Object B Statement: There was a dog with a bag full of toys. It was a blue dog with a green hat. Object C Statement: There was a dolphin with a bag full of bubbles. It was a purple dolphin with an orange hat. Object D Statement: There was a zebra with a bag full of grass. It was a white zebra with a blue hat. Object E Statement: There was a bear with a bag full of wood. It was a brown bear with a black hat."
a, b, c, d, e = re.split(r"\B\s(?=[^\s:]+:)", word)
regex = re.compile(r"""Object A Statement\s(.*?)Object B Statement\s(.*?)Object C Statement\s(.*?)Object D Statement\s(.*?)Object E Statement\s(.*)""", re.S|re.X)
a, b, c, d, e = regex.match(word).groups()

标签: pythonnlpre


You can split the string with "\s*Object . Statement:\s*"

import re

word="Object A Statement: There was a cat with a bag full of meat. It was a red cat with a blue hat. Object B Statement: There was a dog with a bag full of toys. It was a blue dog with a green hat. Object C Statement: There was a dolphin with a bag full of bubbles. It was a purple dolphin with an orange hat. Object D Statement: There was a zebra with a bag full of grass. It was a white zebra with a blue hat. Object E Statement: There was a bear with a bag full of wood. It was a brown bear with a black hat."
result = re.split(r"\s*Object . Statement:\s*", word)
result = [r for r in result if len(r) > 0]

I get the following result.

There was a cat with a bag full of meat. It was a red cat with a blue hat.
There was a dog with a bag full of toys. It was a blue dog with a green hat.
There was a dolphin with a bag full of bubbles. It was a purple dolphin with an orange hat.
There was a zebra with a bag full of grass. It was a white zebra with a blue hat.
There was a bear with a bag full of wood. It was a brown bear with a black hat.
