sql - 数据源表不支持 LOAD DATA
问题描述
我是 ADB 的新手,并尝试使用 parquet 文件将数据加载到 databricks 中的表中,我给出了以下命令:
load data local inpath '/FileStore/tables/Subsidiary__1_-2.parquet' into table Subsidiary
但它抛出的错误如下:
SQL 语句错误:AnalysisException:数据源表不支持 LOAD DATA:`default`.`subsidiary`;
谁能解释为什么会这样
解决方案
根据Databricks 关于(highlighting's mine)的官方文档:LOAD DATA
将数据从用户指定的目录或文件加载到Hive SerDe 表中。
根据异常消息(突出显示的我的),您使用 Spark SQL 表(数据源表):
AnalysisException:数据源表不支持加载数据:
default
。subsidiary
;
最简单的方法是DESCRIBE EXTENDED
验证自己,Provider 不是Hive
其他东西(例如parquet
)。
演示
Welcome to
____ __
/ __/__ ___ _____/ /__
_\ \/ _ \/ _ `/ __/ '_/
/___/ .__/\_,_/_/ /_/\_\ version 3.0.1
/_/
Using Scala version 2.12.10 (OpenJDK 64-Bit Server VM, Java 11.0.9)
scala> spark.range(5).write.saveAsTable("demo")
scala> sql("DESCRIBE EXTENDED demo").show(truncate = false)
20/12/29 21:57:35 WARN ObjectStore: Failed to get database global_temp, returning NoSuchObjectException
+----------------------------+--------------------------------------------------------------+-------+
|col_name |data_type |comment|
+----------------------------+--------------------------------------------------------------+-------+
|id |bigint |null |
| | | |
|# Detailed Table Information| | |
|Database |default | |
|Table |demo | |
|Owner |jacek | |
|Created Time |Tue Dec 29 21:57:09 CET 2020 | |
|Last Access |UNKNOWN | |
|Created By |Spark 3.0.1 | |
|Type |MANAGED | |
|Provider |parquet | |
|Statistics |2582 bytes | |
|Location |file:/Users/jacek/dev/oss/spark/spark-warehouse/demo | |
|Serde Library |org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe | |
|InputFormat |org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat | |
|OutputFormat |org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat| |
+----------------------------+--------------------------------------------------------------+-------+
scala> sql("load data local inpath 'NOTICE' into table demo")
org.apache.spark.sql.AnalysisException: LOAD DATA is not supported for datasource tables: `default`.`demo`;
at org.apache.spark.sql.execution.command.LoadDataCommand.run(tables.scala:317)
at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:70)
at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:68)
at org.apache.spark.sql.execution.command.ExecutedCommandExec.executeCollect(commands.scala:79)
at org.apache.spark.sql.Dataset.$anonfun$logicalPlan$1(Dataset.scala:229)
at org.apache.spark.sql.Dataset.$anonfun$withAction$1(Dataset.scala:3618)
at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$5(SQLExecution.scala:100)
at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:160)
at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:87)
at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:764)
at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:64)
at org.apache.spark.sql.Dataset.withAction(Dataset.scala:3616)
at org.apache.spark.sql.Dataset.<init>(Dataset.scala:229)
at org.apache.spark.sql.Dataset$.$anonfun$ofRows$2(Dataset.scala:100)
at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:764)
at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:97)
at org.apache.spark.sql.SparkSession.$anonfun$sql$1(SparkSession.scala:607)
at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:764)
at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:602)
... 47 elided
推荐阅读
- reactjs - 如何使用反应测试库访问复选框类型和名称的输入元素?
- mysql - 如何在不直接使用数据透视语句的情况下显示数据透视数据?
- github - 如何从 GitHub Actions 克隆 BitBucket 项目?
- powershell - 使用 Jenkins 代理将文件从网络驱动器复制到本地驱动器
- reactjs - MUI TextField 选择选项在更新后将值显示为行而不是列?
- java - org.reflections:reflections:0.10.2 getConstructorsAnnotatedWIth 返回一个空集
- vim - 如何让 vim 在新缓冲区中显示错误?
- redux - redux 中 state 的初始值未定义
- python - 在Python中从两个具有不同概率的高斯分布中采样
- css - 修复 FullCalendar 高度以填充窗口