python - 熊猫框架中的 IBM 色调分析器输出具有重复值
问题描述
我正在为 newsapi 进行情绪分析,然后进行语气分析。我能够在熊猫框架中显示情绪分析和音调分析器的输出。问题是 IBM 音调分析器的输出具有重复值。我希望这些值在每一行中应该是唯一的。这是相同的代码和输出:
from ibm_watson import ToneAnalyzerV3
from ibm_cloud_sdk_core.authenticators import IAMAuthenticator
authenticator = IAMAuthenticator(apikey)
ta = ToneAnalyzerV3(version='2017-09-21', authenticator=authenticator)
ta.set_service_url(url)
result =[]
for i in new_df['description']:
tone_analysis = ta.tone(
{'text': i},
# 'application/json'
).get_result()
result.append(tone_analysis)
如果我这样做print(result)
,我会得到输出:[{'document_tone': {'tones': [{'score': 0.677676, 'tone_id': 'analytical', 'tone_name': 'Analytical'}]}}
. 像这样有很多值。
如果我只输入result
,我会得到类似的输出,但格式不同,如下所示:
result
使用和似乎有些问题print(result)
接下来,我尝试使用以下代码将值放入 pandas 框架中:
def f(x):
x = ta.tone({'text': i}).get_result()['document_tone']['tones']
return pd.Series(x[0])
new_df = new_df.join(new_df['description'].apply(f))
最后三个特征重复输出,即“score”、“tone_id”、“tone-name”,这就是问题所在。此外,重复值是使用 获得的最后一个值 print(result)
。输出的屏幕截图如下:
解决方案
每行有多个字典列表,因此答案通过扁平列表理解更改enumerate
为带有数字后缀的新列名称:
#change f(x) to f(i)
def f(i):
x = ta.tone({'text': i}).get_result()['document_tone']['tones']
return pd.Series({f'{k}_{i}': v for i, y in enumerate(x)
for k, v in y.items()}, dtype=object)
new_df = new_df['description'].apply(f)
print (new_df)
score_0 tone_id_0 tone_name_0 score_1 tone_id_1 tone_name_1
0 0.677676 analytical Analytical NaN NaN NaN
1 0.620279 analytical Analytical NaN NaN NaN
2 0.683108 sadness Sadness NaN NaN NaN
3 0.920855 analytical Analytical NaN NaN NaN
4 0.825035 confident Confident NaN NaN NaN
5 0.632229 joy Joy 0.527569 tentative Tentative
6 NaN NaN NaN NaN NaN NaN
7 0.574650 sadness Sadness NaN NaN NaN
8 NaN NaN NaN NaN NaN NaN
9 0.751512 confident Confident NaN NaN NaN
10 0.618451 confident Confident NaN NaN NaN
11 0.672469 analytical Analytical 0.912588 confident Confident
12 0.764412 tentative Tentative 0.840583 analytical Analytical
13 0.660207 confident Confident NaN NaN NaN
14 0.840583 analytical Analytical 0.764412 tentative Tentative
15 0.786991 tentative Tentative NaN NaN NaN
16 0.753348 sadness Sadness NaN NaN NaN
17 0.672469 analytical Analytical 0.912588 confident Confident
18 0.590326 sadness Sadness 0.877080 tentative Tentative
19 0.560098 analytical Analytical NaN NaN NaN
添加到原件:
new_df = new_df.join(new_df['description'].apply(f))
推荐阅读
- javascript - 如何在 intl-tel-input 中获取输入的值
- python-3.x - 根据任何属性中的关键字过滤掉与美丽汤一起发现的元素
- java - JAXB2:带有两个包的 JaxBContext
- jenkins - 如果任何测试用例使用 findText 插件失败,如何使 jenkins 构建失败
- delphi - Delphi + Binance Api + 限价单问题 签名无效
- python - 处理 Pandas 列的更快方法
- javers - 从 javers Change 类中仅检索 left、right、propertyName
- c# - 如何创建将在 Linux Debian - ASP.NET 上运行的 ASP.NET 项目?
- mysql - 如何在 laravel 中使用它们各自的值更新多个列?
- c# - 测量工作室图表,集合被修改异常