首页 > 解决方案 > 将 Spark 中的数据提取到 AWS EMR 中的 Hive

问题描述

我正在尝试将 Spark 中的数据摄取到 AWS EMR 集群上的 Hive 中。请找到以下代码以及错误。有人可以帮忙吗?

expedia_raw_df = spark.read.format(file_type) \
  .schema(schema_check) \
  .option("header", first_row_is_header) \
  .option("sep", delimiter) \
.option("mode", "FASTFAIL") \
  .load(file_location)

expedia_raw_df.persist()

expedia_raw_df.write.mode('append').format('hive').saveAsTable(
    "RAW.expedia_raw")

我得到以下错误。

21/06/21 13:42:17 WARN HiveConf: HiveConf of name hive.server2.thrift.url does not exist
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib/spark/python/pyspark/sql/readwriter.py", line 868, in saveAsTable
    self._jwrite.saveAsTable(name)
  File "/usr/lib/spark/python/lib/py4j-0.10.9-src.zip/py4j/java_gateway.py", line 1305, in __call__
  File "/usr/lib/spark/python/pyspark/sql/utils.py", line 131, in deco
    return f(*a, **kw)
  File "/usr/lib/spark/python/lib/py4j-0.10.9-src.zip/py4j/protocol.py", line 328, in get_return_value
py4j.protocol.Py4JJavaError: An error occurred while calling o81.saveAsTable.
: java.net.NoRouteToHostException: No Route to Host from  ip-X-X-1-8/X.X.1.8 to ip-X-X-12-X.ap-south-1.compute.internal:8020 failed on socket timeout exception: java.net.NoRouteToHostException: No route to host; For more details see:  http://wiki.apache.org/hadoop/NoRouteToHost
        at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)

标签: pysparkhiveamazon-emr

解决方案


推荐阅读