首页 > 解决方案 > 无在 Python 编码中导致分组方法

问题描述

我试图在我的 NER 代码中将结果作为打印的句子:

[(Token_1, PoS_1, Tag_1), ..., (Token_n, PoS_n, Tag_n)]

在函数“get_next”中,我每次都得到“”结果——不知道我在这里做错了什么。代码:

class SentenceGetter(object):

    def __init__(self, data):

        self.n_sent = 1    
        self.data = data
        self.empty = False
        agg_func = lambda s: [(w, p, t) for w, p, t in zip(s["Lemat"].values.tolist(),
                                                           s["POS"].values.tolist(),
                                                           s["TAG"].values.tolist())]
        self.grouped = self.data.groupby("Forma").apply(agg_func)
        self.sentences = [s for s in self.grouped]
    
    def get_next(self):
        try:
            s = self.grouped["Forma: {}".format(self.n_sent)]
            self.n_sent += 1
            return s
        except:
            return None
        
getter = SentenceGetter(data)
sent = getter.get_next()
print('Example sentence:')
print(sent)

关于定义的代码块:

data = pd.read_csv("nkjp-morph-named.txt",delimiter="\t")
data = data.fillna(method="ffill")

print("Form number: ", len(data.groupby(['Forma'])))

lemats = list(set(data["Lemat"].values))
n_lemats = len(lemats)
print("Lemats: ", n_lemats)

tags = list(set(data["TAG"].values))
print("TAG:", tags)
n_tags = len(tags)
print("Number of TAGs: ", n_tags)

print("Dataset:")
data.head(n=16)

你能帮我更多地理解它为什么它仍然没有?异常结果:

例句:[('Thousands', 'NNS', 'O'), ('of', 'IN', 'O'), ('demonstrators', 'NNS', 'O'), ('have' , 'VBP', 'O'), ('marched', 'VBN', 'O'), ('through', 'IN', 'O'), ('London', 'NNP', 'B- geo'), ('to', 'TO', 'O'), ('protest', 'VB', 'O'), ('the', 'DT', 'O'), ('war' , 'NN', 'O'), ('in', 'IN', 'O'), ('Iraq', 'NNP', 'B-geo'), ('and', 'CC', ' O'), ('需求', 'VB', 'O'), ('the', 'DT', 'O'), ('提款', 'NN', 'O'), ('of' , '在', 'O'), ('British', 'JJ', 'B-gpe'), ('troops', 'NNS', 'O'), ('from', 'IN', 'O'), (' that', 'DT', 'O'), ('country', 'NN', 'O'), ('.', '.', 'O')]

标签: pythonfunctionnlp

解决方案


推荐阅读