首页 > 解决方案 > 使用谷歌翻译 API 翻译数据框时出错

问题描述

我正在尝试将 SQuAD 1.1 数据集的一部分翻译成僧伽罗语。我不知道我是否可以将 json 文件直接用于翻译到目前为止我尝试的是制作一个 SQuAD 数据集的小数据框,并尝试将其作为演示翻译给我自己。但我得到了不同的错误。以下是我现在遇到的错误。你能帮我解决这个错误或告诉我一个更好的方法来使用 python 完成我的任务吗?

```import googletrans
from googletrans import Translator

import os
from google.cloud import translate_v2 as translate

os.environ['GOOGLE_APPLICATION_CREDENTIALS']=r"C:\Users\Sathsara\Documents\Python Learning\Translation test\translationAPI\flash-medley-278816-b2012b874797.json"

# create a translator object
translator = Translator()

# use translate method to translate a string - by default, the destination language is english
translated = translator.translate('I am Sathsara Rasantha',dest='si')

# the translate method returns an object
print(translated)


# obtain translated string by using attribute .text
translated.text

import pandas as pd


translate_example = pd.read_json("example2.json")
translate_example

contexts = []
questions = []
answers_text = []
answers_start = []
for i in range(translate_example.shape[0]):
    topic = translate_example.iloc[i,0]['paragraphs']
    for sub_para in topic:
        for q_a in sub_para['qas']:
            questions.append(q_a['question'])
            answers_start.append(q_a['answers'][0]['answer_start'])
            answers_text.append(q_a['answers'][0]['text'])
            contexts.append(sub_para['context'])   
df = pd.DataFrame({"context":contexts, "question": questions, "answer_start": answers_start, "text": answers_text})
df
df=df.loc[0:2,:]
df


# make a deep copy of the data frame
df_si = df.copy()

# translate columns' name using rename function
df_si.rename(columns=lambda x: translator.translate(x).text, inplace=True)


df_si.columns


translations = {}
for column in df_si.columns:
    # unique elements of the column
    unique_elements = df_si[column].unique()
    for element in unique_elements:
        # add translation to the dictionary
        translations[element] = translator.translate(element,dest='si').text

print(translations)

# modify all the terms of the data frame by using the previously created dictionary
df_si.replace(translations, inplace = True)

# check translation
df_si.head()```

这是我得到的错误

> --------------------------------------------------------------------------- TypeError                                 Traceback (most recent call
> last) <ipython-input-24-f55a5ca59c36> in <module>
>       5     for element in unique_elements:
>       6         # add translation to the dictionary
> ----> 7         translations[element] = translator.translate(element,dest='si').text
>       8 
>       9 print(translations)
> 
> ~\Anaconda3\lib\site-packages\googletrans\client.py in translate(self,
> text, dest, src)
>     170 
>     171         origin = text
> --> 172         data = self._translate(text, dest, src)
>     173 
>     174         # this code will be updated when the format is changed.
> 
> ~\Anaconda3\lib\site-packages\googletrans\client.py in
> _translate(self, text, dest, src)
>      73             text = text.decode('utf-8')
>      74 
> ---> 75         token = self.token_acquirer.do(text)
>      76         params = utils.build_params(query=text, src=src, dest=dest,
>      77                                     token=token)
> 
> ~\Anaconda3\lib\site-packages\googletrans\gtoken.py in do(self, text)
>     199     def do(self, text):
>     200         self._update()
> --> 201         tk = self.acquire(text)
>     202         return tk
> 
> ~\Anaconda3\lib\site-packages\googletrans\gtoken.py in acquire(self,
> text)
>     144         a = []
>     145         # Convert text to ints
> --> 146         for i in text:
>     147             val = ord(i)
>     148             if val < 0x10000:
> 
> TypeError: 'numpy.int64' object is not iterable

标签: pythonjsonapinlpgoogle-translate

解决方案


推荐阅读