python - 尝试创建词汇表时出现错误请求
问题描述
我想使用 Google 操作指南中的示例命令创建一个单向词汇表以用于我的翻译项目:https ://cloud.google.com/translate/docs/advanced/glossary#unidirectional_glossary
没有使用 python 代码创建单向词汇表的示例,仅适用于等效的集合词汇表,我不知道在代码中要更改什么。
我创建了一个存储桶并上传了我的词汇表文件。
然后我尝试在powershell中执行这个命令:
$cred = gcloud auth application-default print-access-token
$headers = @{ "Authorization" = "Bearer $cred" }
Invoke-WebRequest `
-Method POST `
-Headers $headers `
-ContentType: "application/json; charset=utf-8" `
-InFile request.json `
-Uri "https://translation.googleapis.com/v3/projects/[HIDDEN]/locations/us-east1/glossaries
" | Select-Object -Expand Content
这是 request.json 文件的内容,基于他们的示例:
{
"name":"projects/[HIDDEN]/locations/us-east1/glossaries/kittglossary",
"languagePair": {
"sourceLanguageCode": "en",
"targetLanguageCode": "hu"
},
"inputConfig": {
"gcsSource": {
"inputUri": "gs://kittgloss/glossary.csv"
}
}
}
我得到这个错误返回:
Invoke-WebRequest : The remote server returned an error: (400) Bad Request.
At line:4 char:1
+ Invoke-WebRequest `
+ ~~~~~~~~~~~~~~~~~~~
+ CategoryInfo : InvalidOperation: (System.Net.HttpWebRequest:HttpWebRequest) [Invoke-WebRequest], WebExc
eption
+ FullyQualifiedErrorId : WebCmdletWebResponseException,Microsoft.PowerShell.Commands.InvokeWebRequestCommand
我确实有 GOOGLE_APPLICATION_CREDENTIALS 环境变量,并且在我尝试测试翻译之前隐式身份验证已经起作用
在尝试了示例 python 代码来创建词汇表之后:
from google.cloud import translate_v3 as translate
pid = "[HIDDEN]",
iuri = "gs://kittgloss/glossary.csv",
gid = "kittglossary",
def create_glossary(
project_id,
input_uri,
glossary_id,
timeout,
):
"""
Create a equivalent term sets glossary. Glossary can be words or
short phrases (usually fewer than five words).
https://cloud.google.com/translate/docs/advanced/glossary#format-glossary
"""
client = translate.TranslationServiceClient()
# Supported language codes: https://cloud.google.com/translate/docs/languages
source_lang_code = "en"
target_lang_code = "hu"
location = "us-east1" # The location of the glossary
name = client.glossary_path(project_id, location, glossary_id)
language_codes_set = translate.types.Glossary.LanguageCodesSet(
language_codes=[source_lang_code, target_lang_code]
)
gcs_source = translate.types.GcsSource(input_uri=input_uri)
input_config = translate.types.GlossaryInputConfig(gcs_source=gcs_source)
glossary = translate.types.Glossary(
name=name, language_codes_set=language_codes_set, input_config=input_config
)
parent = client.location_path(project_id, location)
# glossary is a custom dictionary Translation API uses
# to translate the domain-specific terminology.
operation = client.create_glossary(parent=parent, glossary=glossary)
result = operation.result(timeout)
print("Created: {}".format(result.name))
print("Input Uri: {}".format(result.input_config.gcs_source.input_uri))
create_glossary(pid,iuri,gid,timeout=180)
我收到以下错误返回,抱怨文件是元组而不是 str:
Traceback (most recent call last):
File "C:\Users\pc\AppData\Local\Programs\Python\Python38-32\lib\site-packages\google\protobuf\internal\python_message.py", line 702, in field_setter
new_value = type_checker.CheckValue(new_value)
File "C:\Users\pc\AppData\Local\Programs\Python\Python38-32\lib\site-packages\google\protobuf\internal\type_checkers.py", line 215, in CheckValue
raise TypeError(message)
TypeError: ('gs://kittgloss/glossary.csv',) has type <class 'tuple'>, but expected one of: (<class 'bytes'>, <class 'str'>)
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "C:\Users\pc\AppData\Local\Programs\Python\Python38-32\lib\site-packages\google\protobuf\internal\python_message.py", line 558, in init
setattr(self, field_name, field_value)
File "C:\Users\pc\AppData\Local\Programs\Python\Python38-32\lib\site-packages\google\protobuf\internal\python_message.py", line 704, in field_setter
raise TypeError(
TypeError: Cannot set google.cloud.translation.v3.GcsSource.input_uri to ('gs://kittgloss/glossary.csv',): ('gs://kittgloss/glossary.csv',) has type <class 'tuple'>, but expected one of: (<class 'bytes'>, <class 'str'>)
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "C:\py\glossary.py", line 48, in <module>
create_glossary(pid,iuri,gid,timeout=180)
File "C:\py\glossary.py", line 30, in create_glossary
gcs_source = translate.types.GcsSource(input_uri=input_uri)
File "C:\Users\pc\AppData\Local\Programs\Python\Python38-32\lib\site-packages\proto\message.py", line 421, in __init__
self.__dict__["_pb"] = self._meta.pb(**params)
File "C:\Users\pc\AppData\Local\Programs\Python\Python38-32\lib\site-packages\google\protobuf\internal\python_message.py", line 560, in init
_ReraiseTypeErrorWithFieldName(message_descriptor.name, field_name)
File "C:\Users\pc\AppData\Local\Programs\Python\Python38-32\lib\site-packages\google\protobuf\internal\python_message.py", line 477, in _ReraiseTypeErrorWithFieldName
six.reraise(type(exc), exc, sys.exc_info()[2])
File "C:\Users\pc\AppData\Local\Programs\Python\Python38-32\lib\site-packages\six.py", line 702, in reraise
raise value.with_traceback(tb)
File "C:\Users\pc\AppData\Local\Programs\Python\Python38-32\lib\site-packages\google\protobuf\internal\python_message.py", line 558, in init
setattr(self, field_name, field_value)
File "C:\Users\pc\AppData\Local\Programs\Python\Python38-32\lib\site-packages\google\protobuf\internal\python_message.py", line 704, in field_setter
raise TypeError(
TypeError: Cannot set google.cloud.translation.v3.GcsSource.input_uri to ('gs://kittgloss/glossary.csv',): ('gs://kittgloss/glossary.csv',) has type <class 'tuple'>, but expected one of: (<class 'bytes'>, <class 'str'>) for field GcsSource.input_uri
词汇表文件非常简单,前几行如下所示:
rear bumpers,hátsó lökhárító
front bumper spoiler,első lökhárító spoiler
front bumpers,első lökhárító
我会很感激任何帮助。
解决方案
我通过在我的 csv 文件中添加一个标题,将我自己的词汇表转换为一个简单的 EN 到 HU 等效术语词汇表,从而解决了我自己的问题,如下所示:
前几行
en,hu,pos
rear bumpers,hátsó lökhárító,noun
front bumper spoiler,első lökhárító spoiler,noun
front bumpers,első lökhárító,noun
然后我使用示例 python 代码进行了一些修改来创建词汇表。我遇到的一个问题是词汇表显然只能在 us-central1 和 global 中创建。我知道我的代码通过简单地使用字符串看起来并不漂亮,但它确实有效:
from google.cloud import translate_v3beta1 as translate
def create_glossary():
client = translate.TranslationServiceClient()
## Set your project name
project_id = 'flawless-acre-284812'
## Set your wished glossary-id
glossary_id = 'kittglossaryv2'
## Set your location
location = 'us-central1' # The location of the glossary
name = client.glossary_path(
project_id,
location,
glossary_id)
language_codes_set = translate.types.Glossary.LanguageCodesSet(
language_codes=['en', 'hu'])
## SET YOUR BUCKET URI
gcs_source = translate.types.GcsSource(
input_uri='gs://kittgloss/etglossaryv2.csv')
input_config = translate.types.GlossaryInputConfig(
gcs_source=gcs_source)
glossary = translate.types.Glossary(
name=name,
language_codes_set=language_codes_set,
input_config=input_config)
parent = 'projects/flawless-acre-284812/locations/us-central1'
operation = client.create_glossary(parent=parent, glossary=glossary)
result = operation.result(timeout=90)
print('Created: {}'.format(result.name))
print('Input Uri: {}'.format(result.input_config.gcs_source.input_uri))
create_glossary()
希望这对某人有帮助
推荐阅读
- javascript - 在单击按钮时如何显示它已被选中并在表单提交时获取该按钮的 id
- mysql - 当且仅当在同一天发布时,将行显示在彼此之上
- java - Async AssertionError 没有使测试用例失败
- python-3.x - 我无法收到 0x11 之后的任何 HEX 数据
- php - 会计年度更改时自动重置发票编号
- webpack - runtimeChunk 的目的是什么?
- java - 对用户隐藏 Javafx 实现
- python - 如何编码字符串分类数据?
- elasticsearch - 在弹性搜索 5.6.0 中基于别名和模板翻转时面临错误
- powershell-2.0 - 如何在 PowerShell v2.0 中压缩文件