java - Spark Java API、Kerberos 和 Hive 的问题
问题描述
我正在尝试使用 Spark Java API 对 hive 表运行 spark sql 测试。我遇到的问题是kerberos。每当我尝试运行该程序时,我都会收到以下错误消息:
Caused by: org.apache.spark.sql.AnalysisException: java.lang.RuntimeException: org.apache.hadoop.security.AccessControlException: SIMPLE authentication is not enabled. Available:[TOKEN, KERBEROS];
at org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:106)
at org.apache.spark.sql.hive.HiveExternalCatalog.databaseExists(HiveExternalCatalog.scala:194)
at org.apache.spark.sql.internal.SharedState.externalCatalog$lzycompute(SharedState.scala:114)
at org.apache.spark.sql.internal.SharedState.externalCatalog(SharedState.scala:102)
at org.apache.spark.sql.hive.HiveSessionStateBuilder.externalCatalog(HiveSessionStateBuilder.scala:39)
at org.apache.spark.sql.hive.HiveSessionStateBuilder.catalog$lzycompute(HiveSessionStateBuilder.scala:54)
at org.apache.spark.sql.hive.HiveSessionStateBuilder.catalog(HiveSessionStateBuilder.scala:52)
at org.apache.spark.sql.hive.HiveSessionStateBuilder$$anon$1.<init>(HiveSessionStateBuilder.scala:69)
at org.apache.spark.sql.hive.HiveSessionStateBuilder.analyzer(HiveSessionStateBuilder.scala:69)
at org.apache.spark.sql.internal.BaseSessionStateBuilder$$anonfun$build$2.apply(BaseSessionStateBuilder.scala:293)
at org.apache.spark.sql.internal.BaseSessionStateBuilder$$anonfun$build$2.apply(BaseSessionStateBuilder.scala:293)
at org.apache.spark.sql.internal.SessionState.analyzer$lzycompute(SessionState.scala:79)
at org.apache.spark.sql.internal.SessionState.analyzer(SessionState.scala:79)
at org.apache.spark.sql.execution.QueryExecution.analyzed$lzycompute(QueryExecution.scala:57)
at org.apache.spark.sql.execution.QueryExecution.analyzed(QueryExecution.scala:55)
at org.apache.spark.sql.execution.QueryExecution.assertAnalyzed(QueryExecution.scala:47)
at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:74)
at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:638)
at tester.SparkSample.lambda$0(SparkSample.java:62)
... 5 more
在这行代码上:
ss.sql("select count(*) from entps_pma.baraccount").show();
现在,当我运行代码时,我可以很好地登录到 kerberos 并收到以下消息:
18/05/01 11:21:03 INFO security.UserGroupInformation: Login successful for user <kerberos user> using keytab file /root/hdfs.keytab
我什至连接到 Hive Metastore:
18/05/01 11:21:06 INFO hive.metastore: Trying to connect to metastore with URI thrift://<hiveserver>:9083
18/05/01 11:21:06 INFO hive.metastore: Connected to metastore.
但在那之后我得到了错误。欣赏这里的任何方向。这是我的代码:
public static void runSample(String fullPrincipal) throws IOException {
System.setProperty("hive.metastore.sasl.enabled", "true");
System.setProperty("hive.security.authorization.enabled", "true");
System.setProperty("hive.metastore.kerberos.principal", fullPrincipal);
System.setProperty("hive.metastore.execute.setugi", "true");
System.setProperty("hadoop.security.authentication", "kerberos");
Configuration conf = setSecurity(fullPrincipal);
loginUser = UserGroupInformation.getLoginUser();
loginUser.doAs((PrivilegedAction<Void>) () -> {
SparkConf sparkConf = new SparkConf().setMaster("local");
sparkConf.set("spark.sql.warehouse.dir", "hdfs:///user/hive/warehouse");
sparkConf.set("hive.metastore.uris", "thrift://<hive server>:9083");
sparkConf.set("hadoop.security.authentication", "kerberos");
sparkConf.set("hadoop.rpc.protection", "privacy");
sparkConf.set("spark.driver.extraClassPath",
"/opt/cloudera/parcels/CDH/jars/*.jar:/opt/cloudera/parcels/CDH/lib/hive/conf:/opt/cloudera/parcels/CDH/lib/hive/lib/*.jar");
sparkConf.set("spark.executor.extraClassPath",
"/opt/cloudera/parcels/CDH/jars/*.jar:/opt/cloudera/parcels/CDH/lib/hive/conf:/opt/cloudera/parcels/CDH/lib/hive/lib/*.jar");
sparkConf.set("spark.eventLog.enabled", "false");
SparkSession ss = SparkSession
.builder()
.enableHiveSupport()
.config(sparkConf)
.appName("Jim Test Spark App")
.getOrCreate();
ss.sparkContext()
.hadoopConfiguration()
.addResource(conf);
ss.sql("select count(*) from entps_pma.baraccount").show();
return null;
});
}
解决方案
我猜你是在 YARN 上运行 Spark。您需要指定 spark.yarn.principal 和 spark.yarn.keytab 参数。请检查在 YARN 文档上运行 Spark
推荐阅读
- docker - Kubernetes 访问硬件(设备插件?)
- wordpress - 在特定页面上禁用 Elementor
- objective-c - AFNetworking 将 json 响应读取为 text/html
- python - 如何根据内部值的总和对嵌套字典进行排序
- c - 如何找到给定目标数量所需的最小硬币数量(与现有数量不同)
- c++ - 存储在静态 std::list 中的指针的 Valgrind 错误内存泄漏
- requirejs - 将 videojs 7 添加到 magento 2.4
- html - 请更正此 QueryStringParameter
- javascript - 在 react 如何使用可重用组件作为 React Component 传递到 react-router-dom 组件道具?
- azure - 错误 ASPCONFIG:无法加载文件或程序集'CrystalDecisions.CrystalReports.Engine,版本 = 13.0.2000.0 Azure DevOps CI Pipline