首页 > 解决方案 > 熊猫框架中的 IBM 色调分析器输出具有重复值

问题描述

我正在为 newsapi 进行情绪分析,然后进行语气分析。我能够在熊猫框架中显示情绪分析和音调分析器的输出。问题是 IBM 音调分析器的输出具有重复值。我希望这些值在每一行中应该是唯一的。这是相同的代码和输出:

from ibm_watson import ToneAnalyzerV3
from ibm_cloud_sdk_core.authenticators import IAMAuthenticator

authenticator = IAMAuthenticator(apikey)
ta = ToneAnalyzerV3(version='2017-09-21', authenticator=authenticator)
ta.set_service_url(url)

result =[]
for i in new_df['description']:
   tone_analysis = ta.tone(
       {'text': i},
     #  'application/json'
   ).get_result()
   result.append(tone_analysis)

如果我这样做print(result),我会得到输出:[{'document_tone': {'tones': [{'score': 0.677676, 'tone_id': 'analytical', 'tone_name': 'Analytical'}]}}. 像这样有很多值。

如果我只输入result,我会得到类似的输出,但格式不同,如下所示:

在此处输入图像描述

result使用和似乎有些问题print(result)

接下来,我尝试使用以下代码将值放入 pandas 框架中:

def f(x):
    x = ta.tone({'text': i}).get_result()['document_tone']['tones']
    return pd.Series(x[0])

new_df = new_df.join(new_df['description'].apply(f))

最后三个特征重复输出,即“score”、“tone_id”、“tone-name”,这就是问题所在。此外,重复值是使用 获得的最后一个值 print(result) 。输出的屏幕截图如下:

在此处输入图像描述

标签: pythonpandasibm-watson

解决方案


每行有多个字典列表,因此答案通过扁平列表理解更改enumerate为带有数字后缀的新列名称:

#change f(x) to f(i)
def f(i):
    x = ta.tone({'text': i}).get_result()['document_tone']['tones']
    return pd.Series({f'{k}_{i}': v for i, y in enumerate(x) 
                      for k, v in y.items()}, dtype=object)

new_df = new_df['description'].apply(f)
print (new_df)
     score_0   tone_id_0 tone_name_0   score_1   tone_id_1 tone_name_1
0   0.677676  analytical  Analytical       NaN         NaN         NaN
1   0.620279  analytical  Analytical       NaN         NaN         NaN
2   0.683108     sadness     Sadness       NaN         NaN         NaN
3   0.920855  analytical  Analytical       NaN         NaN         NaN
4   0.825035   confident   Confident       NaN         NaN         NaN
5   0.632229         joy         Joy  0.527569   tentative   Tentative
6        NaN         NaN         NaN       NaN         NaN         NaN
7   0.574650     sadness     Sadness       NaN         NaN         NaN
8        NaN         NaN         NaN       NaN         NaN         NaN
9   0.751512   confident   Confident       NaN         NaN         NaN
10  0.618451   confident   Confident       NaN         NaN         NaN
11  0.672469  analytical  Analytical  0.912588   confident   Confident
12  0.764412   tentative   Tentative  0.840583  analytical  Analytical
13  0.660207   confident   Confident       NaN         NaN         NaN
14  0.840583  analytical  Analytical  0.764412   tentative   Tentative
15  0.786991   tentative   Tentative       NaN         NaN         NaN
16  0.753348     sadness     Sadness       NaN         NaN         NaN
17  0.672469  analytical  Analytical  0.912588   confident   Confident
18  0.590326     sadness     Sadness  0.877080   tentative   Tentative
19  0.560098  analytical  Analytical       NaN         NaN         NaN

添加到原件:

new_df = new_df.join(new_df['description'].apply(f))

推荐阅读