首页 > 解决方案 > 如何标准化python列表中的标签?

问题描述

我正在尝试分析对话并需要一种标准化说话者标签的方法。每个对话都是一个子列表列表,每个子列表包含两个字符串:一个用于说话者的 ID,另一个用于实际话语:

myconvo = [['bob','hello alice'],['alice','hello bob'],['bob','goodbye alice'],['alice','goodbye bob']]

最终,我想得到如下结果,其中扬声器标签已标准化:

myconvo = [['speaker1','hello alice'],['speaker2','hello bob'],['speaker1','goodbye alice'],['speaker2','goodbye bob']]

鉴于每次对话都会有不同的发言者,我对如何进行有点不知所措。

到目前为止,我已经识别出...

# empty list to store speaker labels
speakers = set()

# iterate through convo adding speaker names
for sub in myconvo:
    if sub[0] not in speakers:
        speakers.add(sub[0])

# convert to list to access index (where position 0 will be the first speaker, position 1 will be second speaker etc.)
speakers = list(speakers)

我不确定下一步该去哪里,或者即使有更短的方法来解决这个问题。

标签: pythonlist

解决方案


myconvo = [['bob','hello alice'],['alice','hello bob'],['bob','goodbye alice'],['alice','goodbye bob']]

speakers = {}
count = 1
# geeting the unique user name and assigning speaker numbers to them
for i in myconvo:
    if i[0] not in speakers:
        speakers.update({i[0]:'speaker{}'.format(count)})
        count+=1

# changing the name with the speaker number
for i in range(len(myconvo)):
    name = myconvo[i][0]
    myconvo[i][0] = speakers[name]

print(myconvo)        

输出

[['speaker1', 'hello alice'], ['speaker2', 'hello bob'], ['speaker1', 'goodbye alice'], ['speaker2', 'goodbye bob']]

推荐阅读