首页 > 解决方案 > 如何拆分列表中的句子?

问题描述

我正在尝试创建一个函数来计算任何给定句子或句子中的单词数和平均单词长度。假设句子有句号并结束句子,我似乎无法将字符串分成两个句子放入列表中。

def word_length_list(text):
    text = text.replace('--',' ')

    for p in string.punctuation + "‘’”“":
        text = text.replace(p,'')

    text = text.lower()
    words = text.split(".")
    word_length = []
    print(words)

    for i in words:
        count = 0
        for j in i:
            count = count + 1
        word_length.append(count)
    
    return(word_length)

testing1 = word_length_list("Haven't you eaten 8 oranges today? I don't know if you did.")
print(sum(testing1)/len(testing1))

标签: python

解决方案


一种选择可能使用re.split

inp = "Haven't you eaten 8 oranges today? I don't know if you did."
sentences = re.split(r'(?<=[?.!])\s+', inp)
print(sentences)

这打印:

["Haven't you eaten 8 oranges today?", "I don't know if you did."]

我们也可以使用re.findall

inp = "Haven't you eaten 8 oranges today? I don't know if you did."
sentences = re.findall(r'.*?[?!.]', inp)
print(sentences)  # prints same as above

请注意,在这两种情况下,我们都假设句.点仅作为停止出现,而不是缩写的一部分。如果句号可以有多个上下文,那么将句子分开可能会很棘手。例如:

Jon L. Skeet earned more point than anyone.  Gordon Linoff also earned a lot of points.

这里不清楚句号是指句末还是缩写词的一部分。


推荐阅读