首页 > 解决方案 > 标点模式分析器

问题描述

目标:我正在研究句子使用中出现的标点符号模式。以下代码(从网上获得)返回句子中的标点符号!.;。对于 my_str。我正在尝试提取句子中的句子和相应的标点符号,并将其以预期的输出格式发送到 excel。

# define punctuation
punctuations = '''.?![]()"",:;-/'''

my_str = "Really! This is a sample sentence. The cat sat on the mat; the dog slept."

# To take input from the user
# my_str = input("Enter a string: ")

# remove punctuation from the string
punct = ""
for char in my_str:
   if char in punctuations:
       punct = punct + char

# display the unpunctuated string
print(punct)

预期的输出。

Sentences                                    Punct_pattern
Really! This is a sample sentence.            !.
The cat sat on the mat; the dog slept.        ;.

请帮忙。

标签: pythonexcelpandasnltk

解决方案


假设从您的示例中,您将每个句子都用句号分隔,并且所有句子都在一个字符串中:

import pandas as pd
import numpy as np

my_str = "Really! This is a sample sentence. The cat sat on the mat; the dog slept."
delimiter = '.'

raw_sentences = pd.DataFrame()
raw_sentences['sentences'] = pd.Series(my_str.split(delimiter)).replace('', np.nan).dropna()
raw_sentences['sentences']  = raw_sentences['sentences'] + delimiter # add the delimiter back into the data frame

raw_sentences['punctuation_pattern'] = raw_sentences['sentences'].str.replace('[a-zA-z0-9\s\s+]',"") # Remove all characters
raw_sentences.to_excel('file.xlsx')

输出:

print(raw_sentences)
                                 sentences punct_pattern
0       Really! This is a sample sentence.            !.
1   The cat sat on the mat; the dog slept.            ;.

上面的代码也没有明确使用您在示例中定义的标点符号,它通过从句子中删除字母、数字和空格来创建一个新列:str.replace('[a-zA-z0-9\s\s+]',"").

希望这会有所帮助。


推荐阅读