首页 > 解决方案 > 如何为我的 datafrmae 中的每个标记分配标签?

问题描述

我想改变我的数据框的结构,我想转换一个数据框,

在此处输入图像描述

例如第一行是这样

[Yes, it's annoying and cumbersome to separate your rubbish properly all the time., Three different bin bags stink away in the kitchen and have to be sorted into different wheelie bins., But still Germany produces way too much rubbish, and too many resources are lost when what actually should be separated and recycled is burnt., We Berliners should take the chance and become pioneers in waste separation!]

正如您在每一行、每一列段落中看到的那样,我有几个句子,在标签列中,我有 corrpsp[edn 标签,我想打破分配给句子的每个标记,每个它是一个标签,它是 Pos 标签,如这个

[(Yes, UH, 0), (,, ,, 0), (it, PRP, 0), ('s, VBZ, 0), (annoying, JJ, 0), (and, CC, 0), (cumbersome, JJ, 0), (to, TO, 0), (separate, VB, 0), (your, PRP$, 0), (rubbish, NN, 0), (properly, RB, 0), (all, PDT, I0), (the, DT, 0), (time, NN, 0), (., ., 0),...]

我已经使用 spacy 进行了标记化和分配 pos 标签

import spacy
nlp = spacy.load("en_core_web_sm")
def mapTokenPos(l):
    tokenlist=[]
    for x in l:
        doc=nlp(x)
        a=[(token.text,token.tag_) for token in doc]
        tokenlist.append(a)
    return tokenlist
df["IOB"]=df["Paragraph"].apply(mapTokenPos)

但我不知道如何将标签分配给

标签: pythondataframenlp

解决方案


推荐阅读