首页 > 解决方案 > 运行 AWS 胶水工作室 ETL 脚本时出现 ARN 角色授权错误

问题描述


py4j.protocol.Py4JJavaError: An error occurred while calling o85.getDynamicFrame.
: java.sql.SQLException: Exception thrown in awaitResult: 
    at com.databricks.spark.redshift.JDBCWrapper.com$databricks$spark$redshift$JDBCWrapper$$executeInterruptibly(RedshiftJDBCWrapper.scala:133)
    at com.databricks.spark.redshift.JDBCWrapper.executeInterruptibly(RedshiftJDBCWrapper.scala:109)
    at com.databricks.spark.redshift.RedshiftRelation.buildScan(RedshiftRelation.scala:138)
    at org.apache.spark.sql.execution.datasources.DataSourceStrategy$$anonfun$10.apply(DataSourceStrategy.scala:293)
    at org.apache.spark.sql.execution.datasources.DataSourceStrategy$$anonfun$10.apply(DataSourceStrategy.scala:293)
    at org.apache.spark.sql.execution.datasources.DataSourceStrategy$$anonfun$pruneFilterProject$1.apply(DataSourceStrategy.scala:326)
    at org.apache.spark.sql.execution.datasources.DataSourceStrategy$$anonfun$pruneFilterProject$1.apply(DataSourceStrategy.scala:325)
    at org.apache.spark.sql.execution.datasources.DataSourceStrategy.pruneFilterProjectRaw(DataSourceStrategy.scala:381)
    at org.apache.spark.sql.execution.datasources.DataSourceStrategy.pruneFilterProject(DataSourceStrategy.scala:321)
    at org.apache.spark.sql.execution.datasources.DataSourceStrategy.apply(DataSourceStrategy.scala:289)
    at org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$1.apply(QueryPlanner.scala:63)
    at org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$1.apply(QueryPlanner.scala:63)
    at scala.collection.Iterator$$anon$12.nextCur(Iterator.scala:435)
    at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:441)
    at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:440)
    at org.apache.spark.sql.catalyst.planning.QueryPlanner.plan(QueryPlanner.scala:93)
    at org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$2$$anonfun$apply$2.apply(QueryPlanner.scala:78)
    at org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$2$$anonfun$apply$2.apply(QueryPlanner.scala:75)
    at scala.collection.TraversableOnce$$anonfun$foldLeft$1.apply(TraversableOnce.scala:157)
    at scala.collection.TraversableOnce$$anonfun$foldLeft$1.apply(TraversableOnce.scala:157)
    at scala.collection.Iterator$class.foreach(Iterator.scala:891)
    at scala.collection.AbstractIterator.foreach(Iterator.scala:1334)
    at scala.collection.TraversableOnce$class.foldLeft(TraversableOnce.scala:157)
    at scala.collection.AbstractIterator.foldLeft(Iterator.scala:1334)
    at org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$2.apply(QueryPlanner.scala:75)
    at org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$2.apply(QueryPlanner.scala:67)
    at scala.collection.Iterator$$anon$12.nextCur(Iterator.scala:435)
    at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:441)
    at org.apache.spark.sql.catalyst.planning.QueryPlanner.plan(QueryPlanner.scala:93)
    at org.apache.spark.sql.execution.QueryExecution.sparkPlan$lzycompute(QueryExecution.scala:72)
    at org.apache.spark.sql.execution.QueryExecution.sparkPlan(QueryExecution.scala:68)
    at org.apache.spark.sql.execution.QueryExecution.executedPlan$lzycompute(QueryExecution.scala:77)
    at org.apache.spark.sql.execution.QueryExecution.executedPlan(QueryExecution.scala:77)
    at org.apache.spark.sql.Dataset.withAction(Dataset.scala:3359)
    at org.apache.spark.sql.Dataset.head(Dataset.scala:2544)
    at org.apache.spark.sql.Dataset.take(Dataset.scala:2758)
    at com.amazonaws.services.glue.JDBCDataSource.getLastRow(DataSource.scala:944)
    at com.amazonaws.services.glue.JDBCDataSource.getJdbcJobBookmark(DataSource.scala:805)
    at com.amazonaws.services.glue.JDBCDataSource.getDynamicFrame(DataSource.scala:829)
    at com.amazonaws.services.glue.DataSource$class.getDynamicFrame(DataSource.scala:94)
    at com.amazonaws.services.glue.SparkSQLDataSource.getDynamicFrame(DataSource.scala:658)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
    at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
    at py4j.Gateway.invoke(Gateway.java:282)
    at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
    at py4j.commands.CallCommand.execute(CallCommand.java:79)
    at py4j.GatewayConnection.run(GatewayConnection.java:238)
    at java.lang.Thread.run(Thread.java:748)
Caused by: java.sql.SQLException: [Amazon](500310) Invalid operation: Not authorized to get credentials of role arn:aws:iam::**********:role/glue_etl_role
Details: 
 -----------------------------------------------
  error:  Not authorized to get credentials of role arn:aws:iam::*********:role/glue_etl_role
  code:      30000
  context:   
  query:     0
  location:  xen_aws_credentials_mgr.cpp:391
  process:   padbmaster

我们运行 AWS 胶水工作室脚本来执行一些连接和重命名操作。连接器和目标是使用 AWS 粘合目录的 Redshift。

起初的错误是 IAM 没有添加到我们添加的 redshift 中。添加 IAM 后,我们收到了这个新错误,上面写着 Not authorized to get credentials。

标签: amazon-web-servicesamazon-redshiftaws-glueaws-glue-data-catalog

解决方案


AWS Glue 作业需要具有访问数据存储权限的 IAM 角色。确保此角色有权访问您的 Amazon S3 源、目标、临时目录、脚本和作业使用的任何库。

另一个 IAM 角色与 Redshift 集群相关联(因此集群可以代表您访问其他 AWS 服务,例如,如果您的表链接到 S3 存储桶,您必须为其授予权限AmazonS3ReadOnlyAccess)。

确保我们不会混合这两个角色。


推荐阅读